[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
Neil_P._Quinn_WMF added a comment. In T213597#4899804, @mpopov wrote: Thank you so much, @Neil_P._Quinn_WMF! Really appreciate you catching that and correcting. I had incorrectly assumed that initial metadata would not be included. I'm currently looking into your suggested method of filtering revisions and comparing it to using revision_parent_id > 0, which should theoretically yield the same result but is not the case in practice. Glad to help! mediawiki_history is full of so many gotchas; I'll fallen into plenty myself! In T213597#4899804, @mpopov wrote: In T213597#4893765, @Neil_P._Quinn_WMF wrote: I don't understand the point of this, since the NOT revision_is_deleted should have already removed deleted files. (Also the page_id isn't necessarily null for deleted pages; after all the MediaWiki archive table has ar_page_id.) https://commons.wikimedia.org/wiki/File:Box-Front.jpg is a deleted file with a null page_id and it gets included in summarized_revisions otherwise. True, but its revisions do have revision_is_deleted set, so you've already filtered them out of your query.TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Neil_P._Quinn_WMFCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
Neil_P._Quinn_WMF added a comment. In T213597#4893605, @mpopov wrote: Here's the query I used, which I would like someone in #product-analytics (e.g. @chelsyx and @Neil_P._Quinn_WMF) to review: Sure thing! I noticed once big thing: it seems like your counts of file page edits (n_edits_total, n_additions_total, etc.) include the initial edit that creates the pages, so in the end you're getting the proportion of files which have metadata added in the first 2 months, including during the initial upload. I tried excluding those initial creations (event_timestamp != page_creation_timestamp), and it looks like the proportion goes from 99% to 50%. Query excluding intial creations WITH summarized_revisions AS ( SELECT page_id, TO_DATE(page_creation_timestamp) AS creation_date, COUNT(1) AS n_edits, -- not including reverts or reverted SUM(IF(event_timestamp != page_creation_timestamp, 1, 0)) as n_later_edits, SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60 AND event_timestamp != page_creation_timestamp, 1, 0)) AS n_additions_2mo FROM wmf.mediawiki_history WHERE snapshot = '2018-12' AND wiki_db = 'commonswiki' AND page_creation_timestamp between "2018-10-01" and "2018-10-08" AND event_entity = 'revision' AND page_namespace = 6 AND NOT revision_is_identity_revert -- don't count edits that are reverts AND NOT revision_is_identity_reverted -- don't count edits that were reverted AND NOT revision_is_deleted -- don't counts edits moved to archive table AND page_id IS NOT NULL -- don't count deleted files GROUP BY page_id, TO_DATE(page_creation_timestamp) ) SELECT creation_date, COUNT(1) AS n_uploaded, -- files uploaded SUM(IF(n_later_edits > 0, 1, 0)) AS n_later_edited, -- files whose pages were edited after upload SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo -- files that have had metadata added after creation and in first 2 months FROM summarized_revisions GROUP BY creation_date; creation_daten_uploadedn_later_editedn_added_to_2mo 2018-10-01233901330710248 2018-10-0218226113088947 2018-10-03227631680312142 2018-10-0417455128969088 2018-10-05173211139710261 2018-10-06201911245610558 2018-10-0721479115759853 Other comments WITH summarized_revisions AS ( SELECT page_id, TO_DATE(page_creation_timestamp) AS creation_date, COUNT(1) AS n_edits_total, -- not including reverts or reverted I think this includes uploads of new file versions, not just metadata edits, but I don't think it would change the results much. SUM(IF(revision_text_bytes_diff > 0, 1, 0)) AS n_additions_total, SUM(IF(DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_edits_2mo, SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_additions_2mo FROM wmf.mediawiki_history WHERE snapshot = '2018-12' AND wiki_db = 'commonswiki' AND event_entity = 'revision' AND page_namespace = 6 AND NOT revision_is_identity_revert -- don't count edits that are reverts AND NOT revision_is_identity_reverted -- don't count edits that were reverted AND NOT revision_is_deleted -- don't counts edits moved to archive table AND page_id IS NOT NULL -- don't count deleted files I don't understand the point of this, since the NOT revision_is_deleted should have already removed deleted files. (Also the page_id isn't necessarily null for deleted pages; after all the MediaWiki archive table has ar_page_id.) GROUP BY page_id, TO_DATE(page_creation_timestamp) ) SELECT creation_date, COUNT(1) AS n_total, -- files uploaded SUM(IF(n_edits_total > 0, 1, 0)) AS n_edited, -- files that have had metadata edited SUM(IF(n_additions_total > 0, 1, 0)) AS n_added_to, -- files that have had metadata added SUM(IF(n_edits_2mo > 0, 1, 0)) AS n_edited_2mo, -- files that have had metadata edited in first 2 months SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo -- files that have had metadata added in first 2 months FROM summarized_revisions GROUP BY creation_date;TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Neil_P._Quinn_WMFCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Lowered Priority] T177357: Metrics for SDoC: future work of interest (templates and licensing)
Neil_P._Quinn_WMF lowered the priority of this task from "High" to "Normal". TASK DETAILhttps://phabricator.wikimedia.org/T177357EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Neil_P._Quinn_WMFCc: Neil_P._Quinn_WMF, JKatzWMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Nandana, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Darkminds3113, Anooprao, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Raised Priority] T177357: Metrics for SDoC: future work of interest (templates and licensing)
Neil_P._Quinn_WMF raised the priority of this task from "Normal" to "High". TASK DETAILhttps://phabricator.wikimedia.org/T177357EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Neil_P._Quinn_WMFCc: Neil_P._Quinn_WMF, JKatzWMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Nandana, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Darkminds3113, Anooprao, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177357: Metrics for SDoC: future work of interest (templates and licensing)
Neil_P._Quinn_WMF added a comment. For reference, I did some work counting the number of Commons files with different CC licenses:TASK DETAILhttps://phabricator.wikimedia.org/T177357EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Neil_P._Quinn_WMFCc: Neil_P._Quinn_WMF, JKatzWMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T69434: Make Hovercards work on Wikidata item links
Neil_P._Quinn_WMF moved this task to To Do on the Hovercards workboard. TASK DETAIL https://phabricator.wikimedia.org/T69434 WORKBOARD https://phabricator.wikimedia.org/project/board/765/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Bene, Neil_P._Quinn_WMF Cc: Quiddity, Raymond, Bene, Ricordisamoa, SamB, Aklapper, Se4598, JanZerebecki, Wikidata-bugs, Jdforrester-WMF, Vibhabamba, Prtksxna, Liuxinyu970226, Lydia_Pintscher, hoo, aude, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T69434: Make Hovercards work on Wikidata item links
Neil_P._Quinn_WMF moved this task to Nice to have on the Hovercards workboard. TASK DETAIL https://phabricator.wikimedia.org/T69434 WORKBOARD https://phabricator.wikimedia.org/project/board/765/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Bene, Neil_P._Quinn_WMF Cc: Quiddity, Raymond, Bene, Ricordisamoa, SamB, Aklapper, Se4598, JanZerebecki, Wikidata-bugs, Jdforrester-WMF, Vibhabamba, Prtksxna, Liuxinyu970226, Lydia_Pintscher, hoo, aude, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs