A couple of learnings about article deletions from the ACTRIAL analysis: 1. The logging table does not appear to contain correct page IDs of deleted pages until some time in 2014[1]. If you're looking at historical data and want to combine earlier deletions with other information, following Aaron's lead and using the archive table is probably the way to go. 2. The article namespace doesn't just contain "articles", it also contains redirects and disambiguation pages. Particularly redirects can affect measurements of number of pages deleted[2] because there have been instances of cleanup of substantial numbers of redirects. There's no information about redirect status in the archive table, as far as I know, but the log comment can be used to identify a substantial number of such deletions.
The code I used in our analysis of deletion reasons, which also covers the article namespace, is on GitHub: https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py Footnotes: 1. https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-29 2. https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-19#Improving_the_data_gathering Cheers, Morten On Fri, 16 Aug 2019 at 05:31, Samuel Klein <[email protected]> wrote: > Since but 26122 has been fixed, any reason not to use the deletion log > instead? > > On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker <[email protected]> > wrote: > > > Here's a related bit of work: > > https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation > > > > In this research project, I used a mix of both the deletion log and the > > archive table to get a sense for when pages were being deleted. > > > > Ultimately, I found that the easiest deletion event to operationalize was > > to look at the most recent ar_timestamp for a page in the archive table. > > I could only go back to 2008 with this metric because the archive table > > didn't exist before then. > > > > The archive table is available in quarry. See > > https://quarry.wmflabs.org/query/38414 for an example query that gets > the > > timestamp of an article's last revision. > > > > The logging table is also in quarry. See > > https://quarry.wmflabs.org/query/38415 for an example query that gets > > deletion events. > > > > On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <[email protected]> > > wrote: > > > > > Dear all, > > > > > > Is there an easy way to get the number of articles deleted over time > > > (e.g., month) in Wikipedia? > > > > > > Can I use Quarry? What tables should I use? > > > > > > > > > Thanks, > > > > > > Haifeng Zhang > > > _______________________________________________ > > > Wiki-research-l mailing list > > > [email protected] > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > -- > Samuel Klein @metasj w:user:sj +1 617 529 4266 > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
