mpopov added a comment.

I noticed once big thing: it seems like your counts of file page edits (n_edits_total, n_additions_total, etc.) include the initial edit that creates the pages, so in the end you're getting the proportion of files which have metadata added in the first 2 months, including during the initial upload.

I tried excluding those initial creations (event_timestamp != page_creation_timestamp), and it looks like the proportion goes from 99% to 50%.

Thank you so much, @Neil_P._Quinn_WMF! Really appreciate you catching that and correcting. I had incorrectly assumed that initial metadata would not be included. I'm currently looking into your suggested method of filtering revisions and comparing it to using revision_parent_id > 0, which should theoretically yield the same result but is not the case in practice.

Correct numbers coming soon.

I don't understand the point of this, since the NOT revision_is_deleted should have already removed deleted files. (Also the page_id isn't necessarily null for deleted pages; after all the MediaWiki archive table has ar_page_id.)

https://commons.wikimedia.org/wiki/File:Box-Front.jpg is a deleted file with a null page_id and it gets included in summarized_revisions otherwise.


TASK DETAIL
https://phabricator.wikimedia.org/T213597

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to