mpopov added subscribers: chelsyx, Neil_P._Quinn_WMF. mpopov added a comment. |
Okay, here are the numbers which were calculated with the following conditions:
- Using the December 2018 snapshot of MediaWiki History in the Data Lake
- Only files which have not been deleted are counted
- Only revisions to the metadata which were not reverted AND which were not reverts AND which were not deleted
- "Metadata augmented w/in 1st 2mo" means there was at least 1 byte-adding revision to the file's page within the first 60 days after creation
Assuming my query is correct (pending review), then it looks like the baseline for % of files which have metadata added within the first 2 months is 99.993914% overall.
Yearly stats
Year | Files uploaded that year | Metadata augmented w/in 1st 2mo (60d) | Proportion |
---|---|---|---|
2004 | 17,478 | 17,423 | 99.685319% |
2005 | 263,218 | 263,053 | 99.937314% |
2006 | 644,238 | 644,087 | 99.976561% |
2007 | 1,202,209 | 1,202,019 | 99.984196% |
2008 | 1,402,061 | 1,401,908 | 99.989087% |
2009 | 1,926,019 | 1,925,786 | 99.987903% |
2010 | 2,331,837 | 2,331,581 | 99.989022% |
2011 | 3,881,441 | 3,881,089 | 99.990931% |
2012 | 3,489,435 | 3,489,253 | 99.994784% |
2013 | 4,592,177 | 4,592,018 | 99.996538% |
2014 | 4,720,657 | 4,720,534 | 99.997394% |
2015 | 5,684,463 | 5,684,360 | 99.998188% |
2016 | 6,317,906 | 6,317,729 | 99.997198% |
2017 | 8,184,732 | 8,184,286 | 99.994551% |
2018 | 7,983,451 | 7,982,992 | 99.994251% |
Monthly stats for 2018
Month | Files uploaded that month | Metadata augmented w/in 1st 2mo (60d) | Proportion |
---|---|---|---|
January 2018 | 653,574 | 653,516 | 99.991126% |
February 2018 | 705,934 | 705,869 | 99.990792% |
March 2018 | 784,535 | 784,461 | 99.990568% |
April 2018 | 609,663 | 609,627 | 99.994095% |
May 2018 | 714,618 | 714,523 | 99.986706% |
June 2018 | 588,995 | 588,878 | 99.980136% |
July 2018 | 651,006 | 651,003 | 99.999539% |
August 2018 | 784,168 | 784,166 | 99.999745% |
September 2018 | 818,778 | 818,775 | 99.999634% |
October 2018 | 564,108 | 564,102 | 99.998936% |
November 2018 | 574,174 | 574,174 | 100.000000% |
December 2018 | 533,898 | 533,898 | 100.000000% |
Appendix
Here's the query I used, which I would like someone in #product-analytics (e.g. @chelsyx and @Neil_P._Quinn_WMF) to review:
WITH summarized_revisions AS ( SELECT page_id, TO_DATE(page_creation_timestamp) AS creation_date, COUNT(1) AS n_edits_total, -- not including reverts or reverted SUM(IF(revision_text_bytes_diff > 0, 1, 0)) AS n_additions_total, SUM(IF(DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_edits_2mo, SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_additions_2mo FROM wmf.mediawiki_history WHERE snapshot = '2018-12' AND wiki_db = 'commonswiki' AND event_entity = 'revision' AND page_namespace = 6 AND NOT revision_is_identity_revert -- don't count edits that are reverts AND NOT revision_is_identity_reverted -- don't count edits that were reverted AND NOT revision_is_deleted -- don't counts edits moved to archive table AND page_id IS NOT NULL -- don't count deleted files GROUP BY page_id, TO_DATE(page_creation_timestamp) ) SELECT creation_date, COUNT(1) AS n_total, SUM(IF(n_edits_total > 0, 1, 0)) AS n_edited, SUM(IF(n_additions_total > 0, 1, 0)) AS n_added_to, SUM(IF(n_edits_2mo > 0, 1, 0)) AS n_edited_2mo, SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo FROM summarized_revisions GROUP BY creation_date;
TASK DETAIL
EMAIL PREFERENCES
To: mpopov
Cc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter
Cc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs