mpopov added subscribers: chelsyx, Neil_P._Quinn_WMF.
mpopov added a comment.

Okay, here are the numbers which were calculated with the following conditions:

  • Using the December 2018 snapshot of MediaWiki History in the Data Lake
  • Only files which have not been deleted are counted
  • Only revisions to the metadata which were not reverted AND which were not reverts AND which were not deleted
  • "Metadata augmented w/in 1st 2mo" means there was at least 1 byte-adding revision to the file's page within the first 60 days after creation

Assuming my query is correct (pending review), then it looks like the baseline for % of files which have metadata added within the first 2 months is 99.993914% overall.

Yearly stats

YearFiles uploaded that yearMetadata augmented w/in 1st 2mo (60d)Proportion
200417,47817,42399.685319%
2005263,218263,05399.937314%
2006644,238644,08799.976561%
20071,202,2091,202,01999.984196%
20081,402,0611,401,90899.989087%
20091,926,0191,925,78699.987903%
20102,331,8372,331,58199.989022%
20113,881,4413,881,08999.990931%
20123,489,4353,489,25399.994784%
20134,592,1774,592,01899.996538%
20144,720,6574,720,53499.997394%
20155,684,4635,684,36099.998188%
20166,317,9066,317,72999.997198%
20178,184,7328,184,28699.994551%
20187,983,4517,982,99299.994251%

Monthly stats for 2018

MonthFiles uploaded that monthMetadata augmented w/in 1st 2mo (60d)Proportion
January 2018653,574653,51699.991126%
February 2018705,934705,86999.990792%
March 2018784,535784,46199.990568%
April 2018609,663609,62799.994095%
May 2018714,618714,52399.986706%
June 2018588,995588,87899.980136%
July 2018651,006651,00399.999539%
August 2018784,168784,16699.999745%
September 2018818,778818,77599.999634%
October 2018564,108564,10299.998936%
November 2018574,174574,174100.000000%
December 2018533,898533,898100.000000%

Appendix

Here's the query I used, which I would like someone in #product-analytics (e.g. @chelsyx and @Neil_P._Quinn_WMF) to review:

WITH summarized_revisions AS (
  SELECT
    page_id, TO_DATE(page_creation_timestamp) AS creation_date,
    COUNT(1) AS n_edits_total, -- not including reverts or reverted
    SUM(IF(revision_text_bytes_diff > 0, 1, 0)) AS n_additions_total,
    SUM(IF(DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_edits_2mo,
    SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_additions_2mo
  FROM wmf.mediawiki_history
  WHERE snapshot = '2018-12'
    AND wiki_db = 'commonswiki'
    AND event_entity = 'revision'
    AND page_namespace = 6
    AND NOT revision_is_identity_revert -- don't count edits that are reverts
    AND NOT revision_is_identity_reverted -- don't count edits that were reverted
    AND NOT revision_is_deleted -- don't counts edits moved to archive table
    AND page_id IS NOT NULL -- don't count deleted files
  GROUP BY page_id, TO_DATE(page_creation_timestamp)
)
SELECT
  creation_date,
  COUNT(1) AS n_total,
  SUM(IF(n_edits_total > 0, 1, 0)) AS n_edited,
  SUM(IF(n_additions_total > 0, 1, 0)) AS n_added_to,
  SUM(IF(n_edits_2mo > 0, 1, 0)) AS n_edited_2mo,
  SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo
  FROM summarized_revisions
GROUP BY creation_date;

TASK DETAIL
https://phabricator.wikimedia.org/T213597

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to