A couple of learnings about article deletions from the ACTRIAL analysis:

   1. The logging table does not appear to contain correct page IDs of
   deleted pages until some time in 2014[1]. If you're looking at historical
   data and want to combine earlier deletions with other information,
   following Aaron's lead and using the archive table is probably the way to
   go.
   2. The article namespace doesn't just contain "articles", it also
   contains redirects and disambiguation pages. Particularly redirects can
   affect measurements of number of pages deleted[2] because there have been
   instances of cleanup of substantial numbers of redirects. There's no
   information about redirect status in the archive table, as far as I know,
   but the log comment can be used to identify a substantial number of such
   deletions.

The code I used in our analysis of deletion reasons, which also covers the
article namespace, is on GitHub:
https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py

Footnotes:

   1.
   
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-29
   2.
   
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-19#Improving_the_data_gathering


Cheers,
Morten

On Fri, 16 Aug 2019 at 05:31, Samuel Klein <[email protected]> wrote:

> Since but 26122 has been fixed, any reason not to use the deletion log
> instead?
>
> On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker <[email protected]>
> wrote:
>
> > Here's a related bit of work:
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
> >
> > In this research project, I used a mix of both the deletion log and the
> > archive table to get a sense for when pages were being deleted.
> >
> > Ultimately, I found that the easiest deletion event to operationalize was
> > to look at the most recent ar_timestamp for a page in the archive table.
> >  I could only go back to 2008 with this metric because the archive table
> > didn't exist before then.
> >
> > The archive table is available in quarry.  See
> > https://quarry.wmflabs.org/query/38414 for an example query that gets
> the
> > timestamp of an article's last revision.
> >
> > The logging table is also in quarry.  See
> > https://quarry.wmflabs.org/query/38415 for an example query that gets
> > deletion events.
> >
> > On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <[email protected]>
> > wrote:
> >
> > > Dear all,
> > >
> > > Is there an easy way to get the number of articles deleted over time
> > > (e.g., month) in Wikipedia?
> > >
> > > Can I use Quarry? What tables should I use?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [email protected]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to