[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
mpopov added subscribers: AndrewTavis_WMDE, mpopov. mpopov added a comment. @AndrewTavis_WMDE asked me for some thoughts/suggestions here :) I started typing out a DM reply but decided some of this stuff would be good to have on public record. > it's not normal that snapshots go back a decade plus, so I'm a bit confused on this The way that MediaWiki and Wikidata snapshots work – and have to work, due to the nature of the data – is they are snapshots in time of EVERYTHING at the time of the snapshot generation. This is why even `wmf.edits_hourly` (or whatever that table is called) can contain counts of edits made in April even though the latest snapshot is '2024-04' – it's indiscriminate of timestamps associated with any of the data. I think 3-4 snapshots back is probably a good number of snapshots to keep because it does enable us to investigate odd discrepancies between snapshots T355182 <https://phabricator.wikimedia.org/T355182> – beyond the state change problem. The challenge with this data that you may have come across is that state of things (whether an edit got deleted or reverted, whether a user is labelled as a bot or not) changes over time, so the same edit or the same user made years ago can be categorized differently from snapshot to snapshot. Ultimately, **any metric that is calculated from data which can change state is going to be subject to drift when a static measurement is stored anywhere.** We actually run into this problem with the key result for FY23-24 Wiki Experiences Objective 1.1 (Superset dashboard <https://superset.wikimedia.org/superset/dashboard/501/>) that aims to increase number of unreverted (and undeleted) mobile contributions to articles on Wikipedia by 10%. Throughout March 2024 – when the '2024-02' snapshot was used – the metric for the KR was at 4.7%. Then, when the '2024-03' snapshot was generated (at the beginning of April), the February value of that metric changed to 4.4% – because the state of the edits made in February changed. The dashboard uses the most recently available snapshot and has no memory about the values of the metric based on previous snapshots. If we were to store a value in a spreadsheet or a report and then 1+ snapshots later compare the dashboard to the spreadsheet/report, there will be a discrepancy. There's no getting around it – it's natural and folks who work with or look at these metrics need to become comfortable with that concept. There are some things we can do to improve stability (decrease snapshot-to-snapshot variability) of the metric, but it won't address the problem entirely. Like, we could (and should) impose "not reverted within first 48 hours" as opposed to currently "not reverted at the time of the snapshot" but deletion of edits and also whether a user is considered a real editor or a bot, well, those are going to change snapshot-to-snapshot and dealing with those would be extremely painful. I won't evaluate the listed metrics but I will recommend asking yourselves the following for each metric: - Can we backfill this? Can we re-compute the history of this metric given a snapshot? - Are we comfortable re-computing the entire history of this metric with each new snapshot? - Will we be reporting this metric anywhere else and would it be a problem if what we reported in the past and what we report in the future differ? - Are we comfortable calculating the value of the metric only once and storing that somewhere that we call "source of truth" for measurements of this metric going forward? - For example, you calculate the value of metric A for April 2024 (using March 2024 snapshot) and hold on that value because once the March 2024 snapshot is deleted, any re-calculation of metric A for April 2024 using a later snapshot will result in a different value. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)
mpopov removed a project: Product-Analytics. TASK DETAIL https://phabricator.wikimedia.org/T348999 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, mpopov Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, EChetty, Base ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349531: Add testing framework to wmfdata-python
mpopov removed a project: Product-Analytics. TASK DETAIL https://phabricator.wikimedia.org/T349531 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, EChetty, Base ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)
mpopov added a comment. > are most people at WMF writing spark pythonically and not with queries? I guess it depends on who you talk to and what they're doing. All of the data scientists/analysts I work with use Spark SQL engine and write HiveQL queries, often because `hive.run` is too slow. Occasionally I see dot notation for advanced PySpark usage (e.g. Morten's survey aggregation data pipeline <https://github.com/nettrom/Growth-welcomesurvey-2018/blob/master/T275172_survey_aggregation.ipynb>). I suspect dot notation-based Spark usage is probably more common among software engineers. TASK DETAIL https://phabricator.wikimedia.org/T342111 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, mpopov Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T177358: Metrics for SDoC: translations
mpopov closed subtask T182352: UDF for language detection as Invalid. TASK DETAIL https://phabricator.wikimedia.org/T177358 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: RhinosF1, PDrouin-WMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Astuthiodit_1, bking, EChetty, karapayneWMDE, Invadibot, GFontenelle_WMF, MPhamWMF, maantietaja, FRomeo_WMF, CBogen, ItamarWMDE, Nintendofan885, Akuckartz, ET4Eva, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Cparle, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Salgo60, Avner, Silverfish, Gehel, _jensen, rosalieper, Scott_WUaS, FloNight, Susannaanas, Fuzheado, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Daniel_Mietchen, Ricordisamoa, Wesalius, Lydia_Pintscher, Raymond, Steinsplitter, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T292152: dashboard with daily query service usage not updating
mpopov closed this task as a duplicate of T287381: External referrer WDQS metrics stopped updating on 2021-04-25. TASK DETAIL https://phabricator.wikimedia.org/T292152 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: SWakiyama, MPhamWMF, dcausse, mpopov, Zbyszko, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T292152: dashboard with daily query service usage not updating
mpopov added a comment. Thanks @MPhamWMF! What Mike and David said is correct. Also, this ticket prompted me to finally add the decommission notice to the dashboard (previously it was only on the homepage). In T292152#7391826 <https://phabricator.wikimedia.org/T292152#7391826>, @Lydia_Pintscher wrote: > In the meantime for my talk: Do we know what the current number is? For 2021-09-30: | Path | "Automated" | "User" | Total | | - | --- | -- | - | | / | 2109| 2290 | 4399 | | /bigdata/ldf | 4 | 55230 | 55234 | | /bigdata/namespace/wdq/sparql | 1835762 | 5786966 | 7622728 | | Anyone with private data access can easily count 1 day's requests using Hue <https://hue.wikimedia.org/> and this Hive query (slightly modified from https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/wdqs/basic_usage): USE wmf; SELECT year, month, day, IF(uri_path = '/sparql', '/bigdata/namespace/wdq/sparql', uri_path) AS path, UPPER(http_status IN('200','304')) as http_success, CASE WHEN ( agent_type = 'user' AND ( user_agent RLIKE 'https?://' OR INSTR(user_agent, 'www.') > 0 OR INSTR(user_agent, 'github') > 0 OR LOWER(user_agent) RLIKE '([a-z0-9._%-]+@[a-z0-9.-]+\.(com|us|net|org|edu|gov|io|ly|co|uk))' OR ( user_agent_map['browser_family'] = 'Other' AND user_agent_map['device_family'] = 'Other' AND user_agent_map['os_family'] = 'Other' ) ) ) OR agent_type = 'spider' THEN 'TRUE' ELSE 'FALSE' END AS is_automata, COUNT(*) AS events FROM wmf.webrequest WHERE webrequest_source = 'text' AND year = ${year} AND month = ${month} AND day = ${day} AND uri_host = 'query.wikidata.org' AND uri_path IN('/', '/bigdata/namespace/wdq/sparql', '/bigdata/ldf', '/sparql') GROUP BY year, month, day, IF(uri_path = '/sparql', '/bigdata/namespace/wdq/sparql', uri_path), UPPER(http_status IN('200','304')), CASE WHEN ( agent_type = 'user' AND ( user_agent RLIKE 'https?://' OR INSTR(user_agent, 'www.') > 0 OR INSTR(user_agent, 'github') > 0 OR LOWER(user_agent) RLIKE '([a-z0-9._%-]+@[a-z0-9.-]+\.(com|us|net|org|edu|gov|io|ly|co|uk))' OR ( user_agent_map['browser_family'] = 'Other' AND user_agent_map['device_family'] = 'Other' AND user_agent_map['os_family'] = 'Other' ) ) ) OR agent_type = 'spider' THEN 'TRUE' ELSE 'FALSE' END; **I would NOT recommend querying an entire month with 1 query** since it uses webrequest data which **should be queried 1 day at a time at most**. Also, the query uses non-standard "automata" determination. At the time (those years ago) I thought it was clever, but these days I would not use those rules and if I had infinite time I would switch to https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection TASK DETAIL https://phabricator.wikimedia.org/T292152 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: SWakiyama, MPhamWMF, dcausse, mpopov, Zbyszko, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] [Unassigned] T199016: Count structured data uploads and edits by volunteer-built tools
mpopov removed mpopov as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T199016 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ramsey-WMF, Abit, CBogen, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, V4switch, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Matanya, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests
mpopov added a comment. @Abit: it's still not entirely clear which query from T238878 <https://phabricator.wikimedia.org/T238878> @Milimetric should productionize in this ticket. From my conversation with Kate, it seems like your team wants to use the 7.8M number from the Lua-populated table using the query from T238878#5683048 <https://phabricator.wikimedia.org/T238878#5683048>, but there's also an overwhelming support for the query in T238878#5708511 <https://phabricator.wikimedia.org/T238878#5708511> which yields a count of 3M? I've pointed out the problems of missing data and quality in general in the Lua-populated table, so I'm not sure if that's the one you want to go with. Can you or @matthiasmullie please confirm exactly which query should be used? TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Milimetric, mpopov Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests
mpopov added a comment. In T239565#5706854 <https://phabricator.wikimedia.org/T239565#5706854>, @Milimetric wrote: > Yay, I get to work with @mpopov :) Aw, I feel likewise! :D > - how often should this report be updated? I think for the intended purpose a monthly granularity is fine since the check-ins have in the past been quarterly or every 6mo. Even if the query takes like 35 minutes to run on unsqooped data, would it be okay to schedule it to run daily or weekly? > - is it exactly that query? This task mentions "queries" plural, just making sure It's starting to look like the query in T238878#5708511 <https://phabricator.wikimedia.org/T238878#5708511> is the one that should be used? > - given the confusion about deletion (T238878#5706835 <https://phabricator.wikimedia.org/T238878#5706835>), should we also count stuff from the archive table? I don't think deleted files should be counted, no. I think the end result should be, ideally, a daily-granularity data source in Turnilo/Superset having: - total count of files on Commons - total count of files on Commons having structured data (per query in T238878#5708511 <https://phabricator.wikimedia.org/T238878#5708511>) This would enable @Abit & @Ramsey-WMF to track progress of SDC over time in a dashboard as (1) an absolute, and (2) relative % (via post-aggregation in Superset) in Superset (esp. since that also has periodicity like YoY built in, which would be useful for them). Would have to be careful with the auto aggregation, though. The metrics would need to be specified as, like, longMax instead of longSum. @Milimetric: do you have a destination in mind for the reports? I guess the MVP is just a CSV in /srv/published-datasets and we can figure out next steps later so this task's scope doesn't blow up, or do y'all have an easy pipeline/process for running reportupdater and ingesting the output into Druid? TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Milimetric, mpopov Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T238878: Data about how many file pages on Commons contain at least one structured data element
mpopov added subscribers: Mayakp.wiki, daniel, Ladsgroup. mpopov added a comment. I was looking at populateEntityUsage.php <https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/814e7a53ab65e6a90f30cb9f066a04b822a76c71/client/maintenance/populateEntityUsage.php> (Maintenance script for populating wbc_entity_usage <https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage> based on the page_props <https://www.mediawiki.org/wiki/Manual:Page_props_table> table.) So if the entity usage table is populated from page props table, it partially explains why the statements for File:Póvoa de Varzim -i---i- (25379025808).jpg <https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i-_(25379025808).jpg> aren't showing up. They're not in the page props table: F31133752: Screen Shot 2019-11-22 at 4.30.41 PM.png <https://phabricator.wikimedia.org/F31133752> SELECT * FROM page_props AS pp LEFT JOIN page ON pp.pp_page = page.page_id WHERE pp_propname = 'wikibase_item' -- AND page_namespace = 6 -- returns 0 results LIMIT 100 Only shows that basically only ns:0 (mostly pages listing categories) and ns:14 have the `wikibase_item` page property. @daniel @Ladsgroup: hi o/ I'm pinging you because you're listed as the authors on a bunch of the relevant Wikibase code (including that entity usage maintenance script). Can you please help point us at somewhere, anywhere that we can use to figure out how many files on Commons have had labels, depicts, and other statements added? A different strategy is to use the revision comments to look for how many ns:6 pages have had revisions where the comment included `wbset`, for example: SELECT page_title, page_namespace, rev_id, IF(rev_comment = '', comment_text, rev_comment) AS revision_comment FROM revision rev LEFT JOIN page ON rev.rev_page = page.page_id LEFT JOIN revision_comment_temp rct ON rev.rev_id = rct.revcomment_rev LEFT JOIN `comment` ON rct.revcomment_comment_id = `comment`.comment_id WHERE page_namespace = 6 AND rev_page = 68860692 AND (comment_text RLIKE 'wbset(claim|label)' OR rev_comment RLIKE 'wbset(claim|label)') F31133865: Screen Shot 2019-11-22 at 5.00.30 PM.png <https://phabricator.wikimedia.org/F31133865> Which only looks at additions, not changes/removals but we can fix that. Anyways, using this method we can count how many files have had structured data added to them as of the end of October 2019 (using Analytics Engineering's MediaWiki History in Data Lake <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history>: WITH structured_data_additions AS ( SELECT page_id, SUM(IF(event_comment RLIKE 'wbsetclaim', 1, 0)) > 0 AS had_claim_added, SUM(IF(event_comment RLIKE 'wbsetlabel', 1, 0)) > 0 AS had_label_added FROM mediawiki_history WHERE snapshot = '2019-10' AND wiki_db = 'commonswiki' AND event_entity = 'revision' AND page_namespace = 6 AND event_comment RLIKE 'wbset(label|claim)' AND NOT revision_is_identity_reverted GROUP BY page_id ) SELECT CASE WHEN had_claim_added AND had_label_added THEN 'statement(s) and label(s)' WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)' WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)' END AS structured_data_added, COUNT(1) AS n_files FROM structured_data_additions GROUP BY CASE WHEN had_claim_added AND had_label_added THEN 'statement(s) and label(s)' WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)' WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)' END; @Abit @Ramsey-WMF @Mayakp.wiki: this will be of interest to you. The total number of files which have had structured data //added// to them (and not reverted) before November 2019 is… 1,401,757. This doesn't include claim/label //removals//, so just a heads up there. | structured_data_added | n_files | | - | - | | just label(s) | 1 112 577 | | just statement(s) | 163 200 | | statement(s) and label(s) | 125 980 | | For a more up-to-date count, here's an equivalent query for the MW replica in MariaDB, but it doesn't include revert status which is provided in the mediawiki_history data: SELECT CASE WHEN had_claim_added AND had_label_added THEN 'statement(s) and label(s)' WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)' WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)' END AS structured_data_additions, COUNT(1) AS n_files FROM ( SELECT rev_page, SUM(I
[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element
mpopov added a comment. Here are the missing screenshots: In T238878#5683048 <https://phabricator.wikimedia.org/T238878#5683048>, @Nuria wrote: > The work done by @mpopov (if you are so kind @mpopov > please upload your screenshots) > The wbc_entity_usage table is supposed to hold info on Wikidata usage for the pages For example, here's a random file I added some structured data to a few days ago: https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i-_(25379025808).jpg > When you look for it the commonswiki replica, it has a page ID of 68860692. Looking for it in the wbc_entity_usage table we only see that it has a caption in English, which I added at basically the same time as several statements: F31133627: 1.png <https://phabricator.wikimedia.org/F31133627> The structured data is missing, despite being added //before// the caption. > eu_aspect column does have other values like "O" (statements) and "D" (not documented, but from a brief investigation looks like it's specifically for linking categories on Commons to Wikidata Q-items). There are some records of files with "O" aspects (as the MW page notes, it can refer to a variety to entity usages but typically it's statements) but then it gets weird because the language of the label isn't recorded and there's a bunch of seemingly unnecessary info? Take for example the MediaWiki DB data for https://commons.wikimedia.org/wiki/File:Jodrell_Bank_Mark_II_5.jpg F31133631: 2.png <https://phabricator.wikimedia.org/F31133631> F31133634: unnamed.png <https://phabricator.wikimedia.org/F31133634> > Woof! That's…not great. So, uh, clearly there's something funky going on with the Wikibase client extension? Or maybe that's data that was recorded by an earlier version of the extension before it knew to append language codes to labels? I don't know enough about the nitty-gritty there, so these are just vaguely educated guesses. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: matthiasmullie, Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added a comment. @Abit @Ramsey-WMF in addition to T213597#4900741, here's the history of that metric with a 7-day rolling average to smooth the daily data a bit: F28004771: 2019-01_checkin.pngTASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added a comment. In T213597#4900903, @Neil_P._Quinn_WMF wrote: True, but its revisions do have revision_is_deleted set, so you've already filtered them out of your query. Huh! Yeah, you're right! Haha, okay so I think what happened was I had checked the summarized_revisions table before I had the revision_is_deleted in the WHERE clause and then added both NOT revision_is_deleted AND page IS NOT NULL after seeing that example. Sorry for the confusion! You were right this whole time :)TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added a comment. In T213597#4893765, @Neil_P._Quinn_WMF wrote: I noticed once big thing: it seems like your counts of file page edits (n_edits_total, n_additions_total, etc.) include the initial edit that creates the pages, so in the end you're getting the proportion of files which have metadata added in the first 2 months, including during the initial upload. I tried excluding those initial creations (event_timestamp != page_creation_timestamp), and it looks like the proportion goes from 99% to 50%. Thank you so much, @Neil_P._Quinn_WMF! Really appreciate you catching that and correcting. I had incorrectly assumed that initial metadata would not be included. I'm currently looking into your suggested method of filtering revisions and comparing it to using revision_parent_id > 0, which should theoretically yield the same result but is not the case in practice. Correct numbers coming soon. I don't understand the point of this, since the NOT revision_is_deleted should have already removed deleted files. (Also the page_id isn't necessarily null for deleted pages; after all the MediaWiki archive table has ar_page_id.) https://commons.wikimedia.org/wiki/File:Box-Front.jpg is a deleted file with a null page_id and it gets included in summarized_revisions otherwise.TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added subscribers: chelsyx, Neil_P._Quinn_WMF.mpopov added a comment. Okay, here are the numbers which were calculated with the following conditions: Using the December 2018 snapshot of MediaWiki History in the Data Lake Only files which have not been deleted are counted Only revisions to the metadata which were not reverted AND which were not reverts AND which were not deleted "Metadata augmented w/in 1st 2mo" means there was at least 1 byte-adding revision to the file's page within the first 60 days after creation Assuming my query is correct (pending review), then it looks like the baseline for % of files which have metadata added within the first 2 months is 99.993914% overall. Yearly stats YearFiles uploaded that yearMetadata augmented w/in 1st 2mo (60d)Proportion 200417,47817,42399.685319% 2005263,218263,05399.937314% 2006644,238644,08799.976561% 20071,202,2091,202,01999.984196% 20081,402,0611,401,90899.989087% 20091,926,0191,925,78699.987903% 20102,331,8372,331,58199.989022% 20113,881,4413,881,08999.990931% 20123,489,4353,489,25399.994784% 20134,592,1774,592,01899.996538% 20144,720,6574,720,53499.997394% 20155,684,4635,684,36099.998188% 20166,317,9066,317,72999.997198% 20178,184,7328,184,28699.994551% 20187,983,4517,982,99299.994251% Monthly stats for 2018 MonthFiles uploaded that monthMetadata augmented w/in 1st 2mo (60d)Proportion January 2018653,574653,51699.991126% February 2018705,934705,86999.990792% March 2018784,535784,46199.990568% April 2018609,663609,62799.994095% May 2018714,618714,52399.986706% June 2018588,995588,87899.980136% July 2018651,006651,00399.999539% August 2018784,168784,16699.999745% September 2018818,778818,77599.999634% October 2018564,108564,10299.998936% November 2018574,174574,174100.00% December 2018533,898533,898100.00% Appendix Here's the query I used, which I would like someone in #product-analytics (e.g. @chelsyx and @Neil_P._Quinn_WMF) to review: WITH summarized_revisions AS ( SELECT page_id, TO_DATE(page_creation_timestamp) AS creation_date, COUNT(1) AS n_edits_total, -- not including reverts or reverted SUM(IF(revision_text_bytes_diff > 0, 1, 0)) AS n_additions_total, SUM(IF(DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_edits_2mo, SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_additions_2mo FROM wmf.mediawiki_history WHERE snapshot = '2018-12' AND wiki_db = 'commonswiki' AND event_entity = 'revision' AND page_namespace = 6 AND NOT revision_is_identity_revert -- don't count edits that are reverts AND NOT revision_is_identity_reverted -- don't count edits that were reverted AND NOT revision_is_deleted -- don't counts edits moved to archive table AND page_id IS NOT NULL -- don't count deleted files GROUP BY page_id, TO_DATE(page_creation_timestamp) ) SELECT creation_date, COUNT(1) AS n_total, SUM(IF(n_edits_total > 0, 1, 0)) AS n_edited, SUM(IF(n_additions_total > 0, 1, 0)) AS n_added_to, SUM(IF(n_edits_2mo > 0, 1, 0)) AS n_edited_2mo, SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo FROM summarized_revisions GROUP BY creation_date;TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added a comment. Thanks for clarifying! Okay, one more question for @Abit & @Ramsey-WMF just so everyone is on the same page. The statistic you want is: the % of all uploaded files which have had additions to their pages in the first 2 months after upload. No breakdown by file type or over time, just a count X and a total Y and the proportion X/Y, correct?TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons
mpopov added a comment. @Ramsey-WMF: hi, I would like to clarify what "metadata" includes. Here's my initial list: every field in the Information template Licensing Categories Or are you referring to the entire page as the metadata? i.e. the whole shebang: F27911262: Screen Shot 2019-01-16 at 10.12.32 AM.png And then any revisions that add bytes (including the newly released captions): F27911283: Screen Shot 2019-01-16 at 10.15.35 AM.png would make the file count towards the statistic? In that case, if a revision removes metadata and then another revision undoes it, does THAT count? Furthermore, for clarification, are you specifically interested in: when a file's metadata is augmented, which is to say when additional metadata is added to a file after it's uploaded and some metadata is there from the outset OR in addition to metadata getting added in the first 2 months after upload, also when the initial upload includes metadata beyond the essential fields (description, date) that are required for the upload Like, if someone is very thorough in their initial upload, does that file get included in the count? Or is it specifically revisions after the initial upload? Also, I assume it does not matter who (or what) adds the metadata in the 2 months after the upload. Whether it's a bot adding a category or another person adding some other metadata, all that matters is that metadata is added. And specifically added, not removed, right?TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T204415: Query stats dashboard not updating
mpopov closed this task as "Resolved".mpopov added a comment. All good now :)TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Jonas, gerritbot, Gehel, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T204415: Query stats dashboard not updating
mpopov removed subscribers: mforns, Ottomata, elukey, Nuria.mpopov added a comment. Alright, I wiped all the request counts starting with August 10th (after making a backup) so Golden/Reportupdater is going to start a re-count using the webrequests in the 'text' partition. WDQS stats re-count should be done by Monday. Thanks for your patience, folks!TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: gerritbot, Gehel, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb, mforns, Ottomata, elukey, Nuria___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unblock] T204415: Query stats dashboard not updating
mpopov closed subtask T205441: 'group' parameter in Reportupdater for automatic chgrp of generated reports as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mforns, gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T204415: Query stats dashboard not updating
mpopov added a subscriber: mforns.mpopov added a comment. In T204415#4612751, @Ottomata wrote: Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users after it creates it. Thank you very much, Andrew! That's gonna need to be done with T205441, which I've started on. That's step 1, which I'll need @mforns's help with CR and enabling the parameter to be specified in the defaults section of the YAML config. Step 2 is Chelsy/me updating the configs to specify the analytics-search-users group and updating the Reportupdater submodule in golden to the patched version. Step 3 is letting Reportupdater run once so it changes the file permissions. Step 4 is clearing out dates in the WDQS report which will need to be recounted. Step 5 is Reportupdater backfilling the missing dates using the patched query. @Addshore hopefully step 5 will be done by end of the week! :)TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mforns, gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating
mpopov added a comment. @Ottomata @Gehel: I tried editing stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv but couldn't because the file belongs to group analytics-search, not analytics-search-users which sort of makes sense because of how we have it configured right now in statistics::discovery: $user = 'analytics-search' $group ='analytics-privatedata-users' ... cron { 'wikimedia-discovery-golden': ensure => present, command => "cd ${dir}/golden && sh main.sh >> ${log_dir}/golden-daily.log 2>&1", hour=> '5', minute => '0', require => [ Class['::statistics::compute'], Git::Clone['wikimedia/discovery/golden'], Mysql::Config::Client['discovery-stats'] ], user=> $user, } and main.sh in wikimedia/discovery/golden repo that generates these datasets: # files created / touched by report updater need to be rw for user and group umask 002 From Puppet 3.8 documentation for cron, it's not clear whether we can…somehow set a group? (Would that even make sense?) I need to edit that file to erase all request counts affected by the 'misc' partition drop that we can recount from the 'text' partition.TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating
mpopov added a comment. In T204415#4611729, @Nuria wrote: Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java BTW query has to filter by path anyway because it also counts WDQS homepage visits so we're not switching to tags in this case. F26189240: Screen Shot 2018-09-24 at 3.58.12 PM.pngTASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T204415: Query stats dashboard not updating
mpopov added a subscriber: Gehel.mpopov added a comment. Thanks for looking into it, @Nuria! And for confirming, @elukey @Ottomata! :) A note for #operations: this is not the first time we've encountered an issue like this. Last year our query for Maps usage stopped working because of partition changes that we weren't told of (T167083), and this is exactly like that. Nobody on #product-analytics is subscribed to ops@lists.wikimedia (because 99.999% of those threads would be irrelevant to us), so I just want to point out that the decisions made by Ops that affect data sources like wmf.webrequest table need to be communicated to analysts who rely on those data sources. I don't think it's reasonable to expect, say, @Gehel to notice those emails in his mailbox and notify us, so I suggest that when authoring emails announcing big, data source-related changes like partition drops and renames, please cc product-analyt...@wikimedia.org since we have scripts and queries that operate on those data sources under certain hard-coded assumptions.TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T177358: Metrics for SDoC: translations
mpopov closed this task as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: PDrouin-WMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, Gq86, E1presidente, Cparle, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unblock] T174519: [epic] SDoC: Determine baseline for metrics
mpopov closed subtask T177358: Metrics for SDoC: translations as "Resolved".Herald added a project: Product-Analytics. TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Nuria, Capt_Swing, Ramsey-WMF, SandraF_WMF, Abit, chelsyx, mpopov, debt, Aklapper, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T177358: Metrics for SDoC: translations
mpopov moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.mpopov added a comment. Search query language breakdown note & results at https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177358-2TASK DETAILhttps://phabricator.wikimedia.org/T177358WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T177358: Metrics for SDoC: translations
mpopov updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...** [x] How many search queries happen in what languages?...TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T177358: Metrics for SDoC: translations
mpopov claimed this task.mpopov set the point value for this task to "8". TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T177357: Metrics for SDoC: future work of interest (templates and licensing)
mpopov moved this task from Current work to Up Next on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis; removed Discovery-Analysis (Current work). TASK DETAILhttps://phabricator.wikimedia.org/T177357WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T177357: Metrics for SDoC: future work of interest (templates and licensing)
mpopov moved this task from Needs triage to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis. TASK DETAILhttps://phabricator.wikimedia.org/T177357WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. @chelsyx do you wanna add your stuff to https://github.com/wikimedia-research/SDoC-Initial-Metrics ?TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T177356: Metrics for SDoC: look at querying databases
mpopov moved this task from In progress to Done on the Discovery-Analysis (Current work) board.mpopov added a comment. Queries & data uploaded to https://github.com/wikimedia-research/SDoC-Initial-Metrics Moving this into 'Done' as I don't think there's anything left to do on this one.TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases
mpopov updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...** [x] How many people are involved in flagging for deletion/deleting files TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases
mpopov added a comment. Growth of number of deleters over time: F10188497: cumulative_deleters.png How many users deleted N-many files: F10188503: deleter_activity.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases
mpopov added a comment. Total files uploaded to Commons (as of right now) by extension: mediaextensionuploads audioogg773305 audiooga6180 audioflac6140 audiomid4993 audiowav3512 audioopus410 docspdf354765 docsdjvu60524 imagejpg/jpeg36918799 imagepng2268026 imagesvg1176530 imagetif/tiff807921 imagegif153959 imagexcf1008 imagewebp95 videoogv66610 videowebm41161 Historical trends: F10187336: monthly_uploads.png F10187339: cumulative_uploads.png Treemap (sans jpg/jpegs because holy moley there's 37M of those and that's more than all the others combined): F10187334: treemap_uploads.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases
mpopov updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...* [x] How many: mpegs, pngs, ogg, etc...** [x] Track organic growth rate of uploads (historical trends)...TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases
mpopov updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...** [x] Average time to deletion? * [] How many people are involved in flagging for deletion/deleting files TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases
mpopov added a comment. Time-to-deletion: F10150716: time-to-deletion.png Most copyright-related deletions happen within 1 day of upload across almost all media types, with the exception of 'drawing' (SVGs) A lot of audio files are deleted within 1 minute or 1 week of upload Half of all images and PDFs deleted were deleted within 1 month of upload for non-copyright reasons TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases
mpopov updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...*** copyright violations (Use case: creation of auto-copyright violation tools) Use case: creation of auto-copyright violation tools*** [[ https://commons.wikimedia.org/wiki/Commons:OTRS | OTRS ]] ** [] ores ** [] Average time to deletion?...TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases
mpopov added a comment. Reasons for files deleted in 2017: F10148687: deletion_reasons.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. In T177354#3676545, @chelsyx wrote: Unfortunately, the mediawiki snapshot doesn't has the image table which describes images and other uploaded files. Ah, yeah. I missed the reference to image in your query. But looks like we can use img_timestamp, although those queries will take some time. Also something to note is that img_major_mime shows up as "application" for .ogg files (which are audio files) and .pdf files: SELECT DISTINCT img_major_mime, img_minor_mime FROM commonswiki.image; img_major_mimeimg_minor_mime imagegif imagejpeg imagepng imagetiff imagevnd.djvu imagewebp imagex-xcf imagesvg+xml applicationogg audiomidi audiowav audiowebm audiox-flac videowebm applicationpdf I recommend adding a CASE that returns "audio" for ogg files and "document" (for example) for PDFs.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. In T177354#3675988, @debt wrote: Hey @chelsyx - what time frame does this cover? Jumping in to say this looks like it's from launch of Commons to now. Can we also get a count of how this has changed over the last week and compare that to the last 30 days? It'd be interesting to see if the numbers are fairly consistent (individual vs institution) or if they have changed quite a bit when extending the time scope. @chelsyx this may be useful: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits as it contains monthly snapshots of the page & user tables as of April 2017TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T177356: Metrics for SDoC: look at querying databases
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.mpopov set the point value for this task to "6".mpopov claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T177356: Metrics for SDoC: look at querying databases
mpopov moved this task from Needs triage to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis. TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T149963: Analyze WDQS traffic data to find parallel connection patterns
mpopov added a comment. How many IPs use parallel connections to the WDQS servers? Out of the IPs that do the above, how many have the same/different user agents (hinting at one tool or proxy serving multiple clients)? Of 14K unique IPs observed between Nov 1st and 28th, 1.9K (13.6%) had made more than 1 request (to SPARQL endpoint) at any given second. Of those, 1360 (71.1%) only had 1 UA; 553 (28.9%) had 2 or more UAs; with 2 IP addresses observed to have 30-33 UAs. How many parallel connections are typically used, how frequent is to use more than 3, what is the max, etc.? 726 IPs (5.17%) were seen making 3 or more requests per second. Of those, 458 (63.1%) only had 1 UA; 268 (36.9%) had 2 or more UAs. 537 IPs (3.82%) were seen making more than 3 requests per second. Of those, 331 (61.64%) only had 1 UA; the rest had 2 or more UAs. In general, how many user agents per IP we have - do we have some IPs that have a lot of different agents (indicating a proxy), how much and how traffic from those IPs looks like - e.g. how many parallel requests, how often theres more than one, more than three? A particular Digital Ocean IP was especially active, using the axios promise based HTTP client 300+ requests made per second 7 different times 200-300 requests made per second 306 different times 100-300 requests made per second 735 different times 100-200 requests made per second by 2 Universidad Politecnica de Madrid IPs 2,200 different times Some were made using a browser on a computer (according to the UA) Some were made using Requests library for Python @Smalyshev: Let me know if you have any additional questions and/or if I missed anything. Hope this helps!TASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T149963: Analyze WDQS traffic data to find parallel connection patterns
mpopov added a comment. @Smalyshev: still in the process of figuring out the parallel connection aspect but here are some minute-by-minute-over-24-hours graphs/stats you might be interested in that I made in the process of playing with the data F4911654: sparql_median_2.png F4911656: sparql_median.png F4911660: sparql_users.png F4911663: cumulative_sparql.pngTASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T149963: Analyze WDQS traffic data to find parallel connection patterns
mpopov moved this task from Up Next to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis. TASK DETAILhttps://phabricator.wikimedia.org/T149963WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T149963: Analyze WDQS traffic data to find parallel connection patterns
mpopov claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov added a comment. Great job! Let's put it up on Commons! :) Use the following licensing & categorization: =={{int:license-header}}== {{WMF-staff-upload|license=cc-by-sa-4.0}} {{Wikimedia trademark}} [[Category:Wikimedia Discovery]] [[Category:Wiki Research]]TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov added a comment. Reviewed copy with minor corrections & suggestions sent back to Chelsy.TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov added a comment. Reviewed; marked-up copy of the 1st draft sent back to Chelsy. Looking forward to 2nd draft :PTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov added a comment. First draft looks good! I will try to review this as soon as I can :)TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov edited the task description. (Show Details) EDIT DETAILS...* These articles on [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive | Hive ]] and [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries | Hive queries ]] are good resources. That second one uses [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline | Beeline ]] interface which we've tried to migrate to once but it didn't work out, so [[ https://github.com/wikimedia/wikimedia-discovery-wmf/blob/master/R/hive.RR | wmf::query_hive() ]] still uses Hive. And [[ https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF | wmf::query_hive()here's a good reference ]] still usesof functions and operations built into HiveQLTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T143762: WDQS: Geographic breakdown of SPARQL queries
mpopov created this task.mpopov added projects: Discovery-Analysis (Current work), Epic, Wikidata-Query-Service.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONBackground In T112605, we performed a broad analysis of Wikidata Query Service users and queries. This was almost a year ago, and we're coming up on the first anniversary of WDQS' public launch (announced on Monday, 7 September 2015). The WDQS dashboard only tracks basic metrics like SPARQL usage, so we don't currently have an up-to-date picture of who WDQS users are and where they're from. But it would be nice to know how that picture looks these days! :) Objective In this task, you will perform an original analysis of web requests, focusing specifically on successful (HTTP status codes 200 & 304) web requests to the SPARQL endpoint (see golden/wdqs/basic_usage.R and lines 45-52 from that old report's analysis codebase for references). Your analysis should focus on the geographic and agent type breakdown of those queries. Which countries have users who use WDQS? What are the top countries by SPARQL queries? How does that breakdown look when you compare known automata vs not known automata? Are the patterns consistent day-to-day over the course of a week? Produce a 1-2 page report of your findings. Once the report has been reviewed & OK'd by me, @debt, and @Smalyshev, please upload the PDF to Commons. Tips & Links You shouldn't need to import/use any refinery UDFs for this analysis; you'll do this in the next task :P Study the refined webrequest schema These articles on Hive and Hive queries are good resources. That second one uses Beeline interface which we've tried to migrate to once but it didn't work out, so wmf::query_hive() still uses Hive. Remember not to include any PII like IP addresses in your report and do not upload the data if you end up making a GitHub repo like this one After uploading the report to Commons, you'll need to copy over some of the licensing info from this report to yours As always, don't hesitate to ask questions or to ask for help/clarification :D TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T141135: "median" not working on WDQS dashboards
mpopov claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T141135: "median" not working on WDQS dashboards
mpopov edited projects, added Discovery-Analysis-Sprint; removed Discovery-Analysis-Backlog. TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T141135: "median" not working on WDQS dashboards
mpopov added a comment. Done: http://discovery.wmflabs.org/wdqs/TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T141135: "median" not working on WDQS dashboards
mpopov added a comment. Forgot to tag this in https://gerrit.wikimedia.org/r/#/c/303582/TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T111790: Improve Phabricator link on Wikidata Query Service dashboard
mpopov added a comment. They do not. Wikimedia repos on GitHub are simple mirrors of Gerrit. To the point where the version on GitHub says that OliverKeyes committed to it but there's no such user. The patch needs to be submitted to Gerrit. TASK DETAIL https://phabricator.wikimedia.org/T111790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: He7d3r, mpopov Cc: Deskana, Ironholds, mpopov, Smalyshev, He7d3r, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, 01tonythomas ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard
mpopov added a comment. First version is live at http://searchdata.wmflabs.org/wdqs/ Will chat with Stas soon to clarify/fix any issues with the data/queries. P.S. Also gave the Discovery Dashboards page a bit of a facelift http://searchdata.wmflabs.org/ :D TASK DETAIL https://phabricator.wikimedia.org/T109361 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov added a comment. Script: https://gerrit.wikimedia.org/r/#/c/235137/1/data_retrieval/wdqs.R Oliver added it to the scheduler and I ran it on the past 40 days to backfill the aggregate dataset that will be up-to-date going forward. TASK DETAIL https://phabricator.wikimedia.org/T109360 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard
mpopov added a comment. Waiting for code review: https://gerrit.wikimedia.org/r/#/c/235365/ TASK DETAIL https://phabricator.wikimedia.org/T109361 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard
mpopov added a comment. Awesome, thank you @EBernhardson :D TASK DETAIL https://phabricator.wikimedia.org/T109361 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T108732 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard
mpopov added a comment. Erik is working on an issue with installing new R packages, especially ones that require version of R newer (e.g. 3.1.2) than what is currently installed (3.0.2). The dashboard is live at http://searchdata.wmflabs.org/wdqs/ but is currently busted because of lack of dplyr. Furthermore, the wdqs.R script for acquiring aggregated data needs to be scheduled for daily execution. @Ironholds, talk to me when you're ready to do schedule it and I'll run the script on the backlog to get the most up-to-date dataset. TASK DETAIL https://phabricator.wikimedia.org/T109361 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard
mpopov added a comment. Dedicated WDQS dashboard is sitting locally on my computer. Waiting for my request for project to be done so I can push the code out to Gerrit. TASK DETAIL https://phabricator.wikimedia.org/T109361 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata
mpopov claimed this task. mpopov set Story Points to 2. TASK DETAIL https://phabricator.wikimedia.org/T108732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T108732 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to Needs review on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard
mpopov moved this task to Needs review on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109361 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to Backlog on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to Backlog on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov added a comment. Need to transfer the logic from HiveQL query to UDF and then to run the script on previous days to fill in the backlog. TASK DETAIL https://phabricator.wikimedia.org/T109360 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards
mpopov moved this task to In progress on the Discovery-Analysis-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T109360 WORKBOARD https://phabricator.wikimedia.org/project/board/1241/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T108732: Train Jan Zerebecki of Wikimedia Germany on how to set up a Shiny dashboard for Wikidata
mpopov added a blocking task: T108094: As a project lead, I'd like documentation on how to set up a Shiny dashboard so that I can visualise the project's key performance indicators . TASK DETAIL https://phabricator.wikimedia.org/T108732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mpopov Cc: Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs