[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-19 Thread mpopov
mpopov added subscribers: AndrewTavis_WMDE, mpopov.
mpopov added a comment.


  @AndrewTavis_WMDE asked me for some thoughts/suggestions here :)
  
  I started typing out a DM reply but decided some of this stuff would be good 
to have on public record.
  
  > it's not normal that snapshots go back a decade plus, so I'm a bit confused 
on this
  
  The way that MediaWiki and Wikidata snapshots work – and have to work, due to 
the nature of the data – is they are snapshots in time of EVERYTHING at the 
time of the snapshot generation. This is why even `wmf.edits_hourly` (or 
whatever that table is called) can contain counts of edits made in April even 
though the latest snapshot is '2024-04' – it's indiscriminate of timestamps 
associated with any of the data.
  
  I think 3-4 snapshots back is probably a good number of snapshots to keep 
because it does enable us to investigate odd discrepancies between snapshots 
T355182 <https://phabricator.wikimedia.org/T355182> – beyond the state change 
problem. The challenge with this data that you may have come across is that 
state of things (whether an edit got deleted or reverted, whether a user is 
labelled as a bot or not) changes over time, so the same edit or the same user 
made years ago can be categorized differently from snapshot to snapshot.
  
  Ultimately, **any metric that is calculated from data which can change state 
is going to be subject to drift when a static measurement is stored anywhere.** 
We actually run into this problem with the key result for FY23-24 Wiki 
Experiences Objective 1.1 (Superset dashboard 
<https://superset.wikimedia.org/superset/dashboard/501/>) that aims to increase 
number of unreverted (and undeleted) mobile contributions to articles on 
Wikipedia by 10%.
  
  Throughout March 2024 – when the '2024-02' snapshot was used – the metric for 
the KR was at 4.7%. Then, when the '2024-03' snapshot was generated (at the 
beginning of April), the February value of that metric changed to 4.4% – 
because the state of the edits made in February changed. The dashboard uses the 
most recently available snapshot and has no memory about the values of the 
metric based on previous snapshots. If we were to store a value in a 
spreadsheet or a report and then 1+ snapshots later compare the dashboard to 
the spreadsheet/report, there will be a discrepancy.
  
  There's no getting around it – it's natural and folks who work with or look 
at these metrics need to become comfortable with that concept. There are some 
things we can do to improve stability (decrease snapshot-to-snapshot 
variability) of the metric, but it won't address the problem entirely. Like, we 
could (and should) impose "not reverted within first 48 hours" as opposed to 
currently "not reverted at the time of the snapshot" but deletion of edits and 
also whether a user is considered a real editor or a bot, well, those are going 
to change snapshot-to-snapshot and dealing with those would be extremely 
painful.
  
  I won't evaluate the listed metrics but I will recommend asking yourselves 
the following for each metric:
  
  - Can we backfill this? Can we re-compute the history of this metric given a 
snapshot?
  - Are we comfortable re-computing the entire history of this metric with each 
new snapshot?
  - Will we be reporting this metric anywhere else and would it be a problem if 
what we reported in the past and what we report in the future differ?
  - Are we comfortable calculating the value of the metric only once and 
storing that somewhere that we call "source of truth" for measurements of this 
metric going forward?
- For example, you calculate the value of metric A for April 2024 (using 
March 2024 snapshot) and hold on that value because once the March 2024 
snapshot is deleted, any re-calculation of metric A for April 2024 using a 
later snapshot will result in a different value.

TASK DETAIL
  https://phabricator.wikimedia.org/T362849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)

2024-01-18 Thread mpopov
mpopov removed a project: Product-Analytics.

TASK DETAIL
  https://phabricator.wikimedia.org/T348999

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, mpopov
Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, 
EChetty, Base
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349531: Add testing framework to wmfdata-python

2024-01-03 Thread mpopov
mpopov removed a project: Product-Analytics.

TASK DETAIL
  https://phabricator.wikimedia.org/T349531

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, EChetty, Base
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-07-31 Thread mpopov
mpopov added a comment.


  > are most people at WMF writing spark pythonically and not with queries?
  
  I guess it depends on who you talk to and what they're doing. All of the data 
scientists/analysts I work with use Spark SQL engine and write HiveQL queries, 
often because `hive.run` is too slow. Occasionally I see dot notation for 
advanced PySpark usage (e.g. Morten's survey aggregation data pipeline 
<https://github.com/nettrom/Growth-welcomesurvey-2018/blob/master/T275172_survey_aggregation.ipynb>).
  
  I suspect dot notation-based Spark usage is probably more common among 
software engineers.

TASK DETAIL
  https://phabricator.wikimedia.org/T342111

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, mpopov
Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, 
AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T177358: Metrics for SDoC: translations

2022-08-02 Thread mpopov
mpopov closed subtask T182352: UDF for language detection as 
Invalid.

TASK DETAIL
  https://phabricator.wikimedia.org/T177358

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: RhinosF1, PDrouin-WMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, 
Ramsey-WMF, Capt_Swing, debt, Astuthiodit_1, bking, EChetty, karapayneWMDE, 
Invadibot, GFontenelle_WMF, MPhamWMF, maantietaja, FRomeo_WMF, CBogen, 
ItamarWMDE, Nintendofan885, Akuckartz, ET4Eva, Nandana, JKSTNK, Lahi, Gq86, 
E1presidente, Cparle, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, 
LawExplorer, Salgo60, Avner, Silverfish, Gehel, _jensen, rosalieper, 
Scott_WUaS, FloNight, Susannaanas, Fuzheado, Jane023, Wikidata-bugs, Base, 
matthiasmullie, aude, Daniel_Mietchen, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Raymond, Steinsplitter, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T292152: dashboard with daily query service usage not updating

2021-10-01 Thread mpopov
mpopov closed this task as a duplicate of T287381: External referrer  WDQS 
metrics stopped updating on 2021-04-25.

TASK DETAIL
  https://phabricator.wikimedia.org/T292152

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: SWakiyama, MPhamWMF, dcausse, mpopov, Zbyszko, Aklapper, Lydia_Pintscher, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T292152: dashboard with daily query service usage not updating

2021-10-01 Thread mpopov
mpopov added a comment.


  Thanks @MPhamWMF!
  
  What Mike and David said is correct. Also, this ticket prompted me to finally 
add the decommission notice to the dashboard (previously it was only on the 
homepage).
  
  In T292152#7391826 <https://phabricator.wikimedia.org/T292152#7391826>, 
@Lydia_Pintscher wrote:
  
  > In the meantime for my talk: Do we know what the current number is?
  
  For 2021-09-30:
  
  | Path  | "Automated" | "User" | Total |
  | - | --- | -- | - |
  | / | 2109| 2290   | 4399  |
  | /bigdata/ldf  | 4   | 55230  | 55234 |
  | /bigdata/namespace/wdq/sparql | 1835762   | 5786966  | 7622728  
   |
  |
  
  Anyone with private data access can easily count 1 day's requests using Hue 
<https://hue.wikimedia.org/> and this Hive query (slightly modified from 
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/wdqs/basic_usage):
  
USE wmf;
SELECT
  year, month, day,
  IF(uri_path = '/sparql', '/bigdata/namespace/wdq/sparql', uri_path) AS 
path,
  UPPER(http_status IN('200','304')) as http_success,
  CASE
WHEN (
  agent_type = 'user' AND (
user_agent RLIKE 'https?://'
OR INSTR(user_agent, 'www.') > 0
OR INSTR(user_agent, 'github') > 0
OR LOWER(user_agent) RLIKE 
'([a-z0-9._%-]+@[a-z0-9.-]+\.(com|us|net|org|edu|gov|io|ly|co|uk))'
OR (
  user_agent_map['browser_family'] = 'Other'
  AND user_agent_map['device_family'] = 'Other'
  AND user_agent_map['os_family'] = 'Other'
  )
)
) OR agent_type = 'spider' THEN 'TRUE'
ELSE 'FALSE' END AS is_automata,
  COUNT(*) AS events
FROM wmf.webrequest
WHERE
  webrequest_source = 'text'
  AND year = ${year} AND month = ${month} AND day = ${day}
  AND uri_host = 'query.wikidata.org'
  AND uri_path IN('/', '/bigdata/namespace/wdq/sparql', '/bigdata/ldf', 
'/sparql')
GROUP BY
  year, month, day,
  IF(uri_path = '/sparql', '/bigdata/namespace/wdq/sparql', uri_path),
  UPPER(http_status IN('200','304')),
  CASE
WHEN (
  agent_type = 'user' AND (
user_agent RLIKE 'https?://'
OR INSTR(user_agent, 'www.') > 0
OR INSTR(user_agent, 'github') > 0
OR LOWER(user_agent) RLIKE 
'([a-z0-9._%-]+@[a-z0-9.-]+\.(com|us|net|org|edu|gov|io|ly|co|uk))'
OR (
  user_agent_map['browser_family'] = 'Other'
  AND user_agent_map['device_family'] = 'Other'
  AND user_agent_map['os_family'] = 'Other'
  )
)
) OR agent_type = 'spider' THEN 'TRUE'
ELSE 'FALSE' END;
  
  **I would NOT recommend querying an entire month with 1 query** since it uses 
webrequest data which **should be queried 1 day at a time at most**. Also, the 
query uses non-standard "automata" determination. At the time (those years ago) 
I thought it was clever, but these days I would not use those rules and if I 
had infinite time I would switch to 
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection

TASK DETAIL
  https://phabricator.wikimedia.org/T292152

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: SWakiyama, MPhamWMF, dcausse, mpopov, Zbyszko, Aklapper, Lydia_Pintscher, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] [Unassigned] T199016: Count structured data uploads and edits by volunteer-built tools

2020-05-18 Thread mpopov
mpopov removed mpopov as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T199016

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ramsey-WMF, Abit, CBogen, darthmon_wmde, Nandana, JKSTNK, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, 
GoranSMilovanovic, QZanden, Tramullas, Acer, V4switch, LawExplorer, Salgo60, 
Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, Jane023, 
Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, 
Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Matanya, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-09 Thread mpopov
mpopov added a comment.


  @Abit: it's still not entirely clear which query from T238878 
<https://phabricator.wikimedia.org/T238878> @Milimetric should productionize in 
this ticket.
  
  From my conversation with Kate, it seems like your team wants to use the 7.8M 
number from the Lua-populated table using the query from T238878#5683048 
<https://phabricator.wikimedia.org/T238878#5683048>, but there's also an 
overwhelming support for the query in T238878#5708511 
<https://phabricator.wikimedia.org/T238878#5708511> which yields a count of 3M? 
I've pointed out the problems of missing data and quality in general in the 
Lua-populated table, so I'm not sure if that's the one you want to go with.
  
  Can you or @matthiasmullie please confirm exactly which query should be used?

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, mpopov
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, 
QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-04 Thread mpopov
mpopov added a comment.


  In T239565#5706854 <https://phabricator.wikimedia.org/T239565#5706854>, 
@Milimetric wrote:
  
  > Yay, I get to work with @mpopov :)
  
  Aw, I feel likewise! :D
  
  > - how often should this report be updated?
  
  I think for the intended purpose a monthly granularity is fine since the 
check-ins have in the past been quarterly or every 6mo. Even if the query takes 
like 35 minutes to run on unsqooped data, would it be okay to schedule it to 
run daily or weekly?
  
  > - is it exactly that query?  This task mentions "queries" plural, just 
making sure
  
  It's starting to look like the query in T238878#5708511 
<https://phabricator.wikimedia.org/T238878#5708511> is the one that should be 
used?
  
  > - given the confusion about deletion (T238878#5706835 
<https://phabricator.wikimedia.org/T238878#5706835>), should we also count 
stuff from the archive table?
  
  I don't think deleted files should be counted, no.
  
  
  
  I think the end result should be, ideally, a daily-granularity data source in 
Turnilo/Superset having:
  
  - total count of files on Commons
  - total count of files on Commons having structured data (per query in 
T238878#5708511 <https://phabricator.wikimedia.org/T238878#5708511>)
  
  This would enable @Abit & @Ramsey-WMF to track progress of SDC over time in a 
dashboard as (1) an absolute, and (2) relative % (via post-aggregation in 
Superset) in Superset (esp. since that also has periodicity like YoY built in, 
which would be useful for them).
  
  Would have to be careful with the auto aggregation, though. The metrics would 
need to be specified as, like, longMax instead of longSum.
  
  @Milimetric: do you have a destination in mind for the reports? I guess the 
MVP is just a CSV in /srv/published-datasets and we can figure out next steps 
later so this task's scope doesn't blow up, or do y'all have an easy 
pipeline/process for running reportupdater and ingesting the output into Druid?

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, mpopov
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, 
QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-22 Thread mpopov
mpopov added subscribers: Mayakp.wiki, daniel, Ladsgroup.
mpopov added a comment.


  I was looking at populateEntityUsage.php 
<https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/814e7a53ab65e6a90f30cb9f066a04b822a76c71/client/maintenance/populateEntityUsage.php>
 (Maintenance script for populating wbc_entity_usage 
<https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage> based on the 
page_props <https://www.mediawiki.org/wiki/Manual:Page_props_table> table.) So 
if the entity usage table is populated from page props table, it partially 
explains why the statements for File:Póvoa de Varzim -i---i- (25379025808).jpg 
<https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i-_(25379025808).jpg>
 aren't showing up. They're not in the page props table:
  
  F31133752: Screen Shot 2019-11-22 at 4.30.41 PM.png 
<https://phabricator.wikimedia.org/F31133752>
  
SELECT *
FROM page_props AS pp
LEFT JOIN page ON pp.pp_page = page.page_id
WHERE pp_propname = 'wikibase_item'
--  AND page_namespace = 6 -- returns 0 results
LIMIT 100
  
  Only shows that basically only ns:0 (mostly pages listing categories) and 
ns:14 have the `wikibase_item` page property.
  
  @daniel @Ladsgroup: hi o/ I'm pinging you because you're listed as the 
authors on a bunch of the relevant Wikibase code (including that entity usage 
maintenance script). Can you please help point us at somewhere, anywhere that 
we can use to figure out how many files on Commons have had labels, depicts, 
and other statements added?
  
  A different strategy is to use the revision comments to look for how many 
ns:6 pages have had revisions where the comment included `wbset`, 
for example:
  
SELECT
  page_title, page_namespace, rev_id, IF(rev_comment = '', comment_text, 
rev_comment) AS revision_comment
FROM revision rev
LEFT JOIN page ON rev.rev_page = page.page_id
LEFT JOIN revision_comment_temp rct ON rev.rev_id = rct.revcomment_rev
LEFT JOIN `comment` ON rct.revcomment_comment_id = `comment`.comment_id
WHERE page_namespace = 6
  AND rev_page = 68860692
  AND (comment_text RLIKE 'wbset(claim|label)' OR rev_comment RLIKE 
'wbset(claim|label)')
  
  F31133865: Screen Shot 2019-11-22 at 5.00.30 PM.png 
<https://phabricator.wikimedia.org/F31133865>
  
  Which only looks at additions, not changes/removals but we can fix that. 
Anyways, using this method we can count how many files have had structured data 
added to them as of the end of October 2019 (using Analytics Engineering's 
MediaWiki History in Data Lake 
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history>:
  
WITH structured_data_additions AS (
SELECT
page_id,
SUM(IF(event_comment RLIKE 'wbsetclaim', 1, 0)) > 0 AS 
had_claim_added,
SUM(IF(event_comment RLIKE 'wbsetlabel', 1, 0)) > 0 AS 
had_label_added
FROM mediawiki_history
WHERE snapshot = '2019-10'
  AND wiki_db = 'commonswiki'
  AND event_entity = 'revision'
  AND page_namespace = 6
  AND event_comment RLIKE 'wbset(label|claim)'
  AND NOT revision_is_identity_reverted
GROUP BY page_id
)
SELECT
CASE
  WHEN had_claim_added AND had_label_added THEN 'statement(s) and 
label(s)'
  WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)'
  WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)'
END AS structured_data_added,
COUNT(1) AS n_files
FROM structured_data_additions
GROUP BY
CASE
WHEN had_claim_added AND had_label_added THEN 'statement(s) and 
label(s)'
WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)'
WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)'
END;
  
  @Abit @Ramsey-WMF @Mayakp.wiki: this will be of interest to you. The total 
number of files which have had structured data //added// to them (and not 
reverted) before November 2019 is… 1,401,757. This doesn't include claim/label 
//removals//, so just a heads up there.
  
  | structured_data_added | n_files   |
  | - | - |
  | just label(s) | 1 112 577 |
  | just statement(s) | 163 200   |
  | statement(s) and label(s) | 125 980   |
  |
  
  For a more up-to-date count, here's an equivalent query for the MW replica in 
MariaDB, but it doesn't include revert status which is provided in the 
mediawiki_history data:
  
SELECT
  CASE WHEN had_claim_added AND had_label_added THEN 'statement(s) and 
label(s)'
   WHEN had_claim_added AND NOT had_label_added THEN 'just statement(s)'
   WHEN had_label_added AND NOT had_claim_added THEN 'just label(s)'
END AS structured_data_additions,
  COUNT(1) AS n_files
FROM (
  SELECT
rev_page,
SUM(I

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-22 Thread mpopov
mpopov added a comment.


  Here are the missing screenshots:
  
  In T238878#5683048 <https://phabricator.wikimedia.org/T238878#5683048>, 
@Nuria wrote:
  
  > The work done by @mpopov  (if you are so kind @mpopov 
  > please upload your screenshots)
  > The wbc_entity_usage table is supposed to hold info on Wikidata usage for 
the pages For example, here's a random file I added some structured data to a 
few days ago: 
https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i-_(25379025808).jpg
  > When you look for it the commonswiki replica, it has a page ID of 68860692. 
Looking for it in the wbc_entity_usage table we only see that it has a caption 
in English, which I added at basically the same time as several statements:
  
  F31133627: 1.png <https://phabricator.wikimedia.org/F31133627>
  
  The structured data is missing, despite being added //before// the caption.
  
  > eu_aspect column does have other values like "O" (statements) and "D" (not 
documented, but from a brief investigation looks like it's specifically for 
linking categories on Commons to Wikidata Q-items). There are some records of 
files with "O" aspects (as the MW page notes, it can refer to a variety to 
entity usages but typically it's statements) but then it gets weird because the 
language of the label isn't recorded and there's a bunch of seemingly 
unnecessary info? Take for example the MediaWiki DB data for 
https://commons.wikimedia.org/wiki/File:Jodrell_Bank_Mark_II_5.jpg
  
  F31133631: 2.png <https://phabricator.wikimedia.org/F31133631>
  
  F31133634: unnamed.png <https://phabricator.wikimedia.org/F31133634>
  
  > Woof! That's…not great. So, uh, clearly there's something funky going on 
with the Wikibase client extension? Or maybe that's data that was recorded by 
an earlier version of the extension before it knew to append language codes to 
labels? I don't know enough about the nitty-gritty there, so these are just 
vaguely educated guesses.

TASK DETAIL
  https://phabricator.wikimedia.org/T238878

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: matthiasmullie, Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, 
GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, 
_jensen, rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, 
terrrydactyl, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, 
Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-23 Thread mpopov
mpopov added a comment.
@Abit @Ramsey-WMF in addition to T213597#4900741, here's the history of that metric with a 7-day rolling average to smooth the daily data a bit:

F28004771: 2019-01_checkin.pngTASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-23 Thread mpopov
mpopov added a comment.

In T213597#4900903, @Neil_P._Quinn_WMF wrote:
True, but its revisions do have revision_is_deleted set, so you've already filtered them out of your query.


Huh! Yeah, you're right! Haha, okay so I think what happened was I had checked the summarized_revisions table before I had the revision_is_deleted in the WHERE clause and then added both NOT revision_is_deleted AND page IS NOT NULL after seeing that example. Sorry for the confusion! You were right this whole time :)TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-22 Thread mpopov
mpopov added a comment.

In T213597#4893765, @Neil_P._Quinn_WMF wrote:
I noticed once big thing: it seems like your counts of file page edits (n_edits_total, n_additions_total, etc.) include the initial edit that creates the pages, so in the end you're getting the proportion of files which have metadata added in the first 2 months, including during the initial upload.

I tried excluding those initial creations (event_timestamp != page_creation_timestamp), and it looks like the proportion goes from 99% to 50%.


Thank you so much, @Neil_P._Quinn_WMF! Really appreciate you catching that and correcting. I had incorrectly assumed that initial metadata would not be included. I'm currently looking into your suggested method of filtering revisions and comparing it to using revision_parent_id > 0, which should theoretically yield the same result but is not the case in practice.

Correct numbers coming soon.

I don't understand the point of this, since the NOT revision_is_deleted should have already removed deleted files. (Also the page_id isn't necessarily null for deleted pages; after all the MediaWiki archive table has ar_page_id.)

https://commons.wikimedia.org/wiki/File:Box-Front.jpg is a deleted file with a null page_id and it gets included in summarized_revisions otherwise.TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-18 Thread mpopov
mpopov added subscribers: chelsyx, Neil_P._Quinn_WMF.mpopov added a comment.
Okay, here are the numbers which were calculated with the following conditions:


Using the December 2018 snapshot of MediaWiki History in the Data Lake
Only files which have not been deleted are counted
Only revisions to the metadata which were not reverted AND which were not reverts AND which were not deleted
"Metadata augmented w/in 1st 2mo" means there was at least 1 byte-adding revision to the file's page within the first 60 days after creation


Assuming my query is correct (pending review), then it looks like the baseline for % of files which have metadata added within the first 2 months is 99.993914% overall.

Yearly stats


YearFiles uploaded that yearMetadata augmented w/in 1st 2mo (60d)Proportion
200417,47817,42399.685319%
2005263,218263,05399.937314%
2006644,238644,08799.976561%
20071,202,2091,202,01999.984196%
20081,402,0611,401,90899.989087%
20091,926,0191,925,78699.987903%
20102,331,8372,331,58199.989022%
20113,881,4413,881,08999.990931%
20123,489,4353,489,25399.994784%
20134,592,1774,592,01899.996538%
20144,720,6574,720,53499.997394%
20155,684,4635,684,36099.998188%
20166,317,9066,317,72999.997198%
20178,184,7328,184,28699.994551%
20187,983,4517,982,99299.994251%





Monthly stats for 2018


MonthFiles uploaded that monthMetadata augmented w/in 1st 2mo (60d)Proportion
January 2018653,574653,51699.991126%
February 2018705,934705,86999.990792%
March 2018784,535784,46199.990568%
April 2018609,663609,62799.994095%
May 2018714,618714,52399.986706%
June 2018588,995588,87899.980136%
July 2018651,006651,00399.999539%
August 2018784,168784,16699.999745%
September 2018818,778818,77599.999634%
October 2018564,108564,10299.998936%
November 2018574,174574,174100.00%
December 2018533,898533,898100.00%





Appendix

Here's the query I used, which I would like someone in #product-analytics (e.g. @chelsyx and @Neil_P._Quinn_WMF) to review:

WITH summarized_revisions AS (
  SELECT
page_id, TO_DATE(page_creation_timestamp) AS creation_date,
COUNT(1) AS n_edits_total, -- not including reverts or reverted
SUM(IF(revision_text_bytes_diff > 0, 1, 0)) AS n_additions_total,
SUM(IF(DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_edits_2mo,
SUM(IF(revision_text_bytes_diff > 0 AND DATEDIFF(event_timestamp, page_creation_timestamp) <= 60, 1, 0)) AS n_additions_2mo
  FROM wmf.mediawiki_history
  WHERE snapshot = '2018-12'
AND wiki_db = 'commonswiki'
AND event_entity = 'revision'
AND page_namespace = 6
AND NOT revision_is_identity_revert -- don't count edits that are reverts
AND NOT revision_is_identity_reverted -- don't count edits that were reverted
AND NOT revision_is_deleted -- don't counts edits moved to archive table
AND page_id IS NOT NULL -- don't count deleted files
  GROUP BY page_id, TO_DATE(page_creation_timestamp)
)
SELECT
  creation_date,
  COUNT(1) AS n_total,
  SUM(IF(n_edits_total > 0, 1, 0)) AS n_edited,
  SUM(IF(n_additions_total > 0, 1, 0)) AS n_added_to,
  SUM(IF(n_edits_2mo > 0, 1, 0)) AS n_edited_2mo,
  SUM(IF(n_additions_2mo > 0, 1, 0)) AS n_added_to_2mo
  FROM summarized_revisions
GROUP BY creation_date;TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Neil_P._Quinn_WMF, chelsyx, MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-17 Thread mpopov
mpopov added a comment.
Thanks for clarifying! Okay, one more question for @Abit & @Ramsey-WMF just so everyone is on the same page. The statistic you want is: the % of all uploaded files which have had additions to their pages in the first 2 months after upload.

No breakdown by file type or over time, just a count X and a total Y and the proportion X/Y, correct?TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: MNeisler, mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213597: [REQUEST] Baselines for structured data on Commons

2019-01-16 Thread mpopov
mpopov added a comment.
@Ramsey-WMF: hi, I would like to clarify what "metadata" includes. Here's my initial list:


every field in the Information template
Licensing
Categories


Or are you referring to the entire page as the metadata? i.e. the whole shebang:

F27911262: Screen Shot 2019-01-16 at 10.12.32 AM.png

And then any revisions that add bytes (including the newly released captions):

F27911283: Screen Shot 2019-01-16 at 10.15.35 AM.png

would make the file count towards the statistic? In that case, if a revision removes metadata and then another revision undoes it, does THAT count?

Furthermore, for clarification, are you specifically interested in:


when a file's metadata is augmented, which is to say when additional metadata is added to a file after it's uploaded and some metadata is there from the outset
OR in addition to metadata getting added in the first 2 months after upload, also when the initial upload includes metadata beyond the essential fields (description, date) that are required for the upload


Like, if someone is very thorough in their initial upload, does that file get included in the count? Or is it specifically revisions after the initial upload?

Also, I assume it does not matter who (or what) adds the metadata in the 2 months after the upload. Whether it's a bot adding a category or another person adding some other metadata, all that matters is that metadata is added.

And specifically added, not removed, right?TASK DETAILhttps://phabricator.wikimedia.org/T213597EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, kzimmerman, Ramsey-WMF, Abit, JKSTNK, Lahi, PDrouin-WMF, E1presidente, Cparle, Anooprao, SandraF_WMF, Tramullas, Acer, Silverfish, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T204415: Query stats dashboard not updating

2018-10-02 Thread mpopov
mpopov closed this task as "Resolved".mpopov added a comment.
All good now :)TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Jonas, gerritbot, Gehel, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T204415: Query stats dashboard not updating

2018-09-28 Thread mpopov
mpopov removed subscribers: mforns, Ottomata, elukey, Nuria.mpopov added a comment.
Alright, I wiped all the request counts starting with August 10th (after making a backup) so Golden/Reportupdater is going to start a re-count using the webrequests in the 'text' partition. WDQS stats re-count should be done by Monday. Thanks for your patience, folks!TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: gerritbot, Gehel, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb, mforns, Ottomata, elukey, Nuria___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T204415: Query stats dashboard not updating

2018-09-27 Thread mpopov
mpopov closed subtask T205441: 'group' parameter in Reportupdater for automatic chgrp of generated reports as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mforns, gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T204415: Query stats dashboard not updating

2018-09-25 Thread mpopov
mpopov added a subscriber: mforns.mpopov added a comment.

In T204415#4612751, @Ottomata wrote:
Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users  after it creates it.


Thank you very much, Andrew! That's gonna need to be done with T205441, which I've started on. That's step 1, which I'll need @mforns's help with CR and enabling the parameter to be specified in the defaults section of the YAML config.

Step 2 is Chelsy/me updating the configs to specify the analytics-search-users group and updating the Reportupdater submodule in golden to the patched version.

Step 3 is letting Reportupdater run once so it changes the file permissions.

Step 4 is clearing out dates in the WDQS report which will need to be recounted.

Step 5 is Reportupdater backfilling the missing dates using the patched query.

@Addshore hopefully step 5 will be done by end of the week! :)TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mforns, gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating

2018-09-24 Thread mpopov
mpopov added a comment.
@Ottomata @Gehel: I tried editing stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv but couldn't because the file belongs to group analytics-search, not analytics-search-users which sort of makes sense because of how we have it configured right now in statistics::discovery:

$user = 'analytics-search'
$group ='analytics-privatedata-users'

...

cron { 'wikimedia-discovery-golden':
ensure  => present,
command => "cd ${dir}/golden && sh main.sh >> ${log_dir}/golden-daily.log 2>&1",
hour=> '5',
minute  => '0',
require => [
Class['::statistics::compute'],
Git::Clone['wikimedia/discovery/golden'],
Mysql::Config::Client['discovery-stats']
],
user=> $user,
}

and main.sh in wikimedia/discovery/golden repo that generates these datasets:

# files created / touched by report updater need to be rw for user and group
umask 002

From Puppet 3.8 documentation for cron, it's not clear whether we can…somehow set a group? (Would that even make sense?)

I need to edit that file to erase all request counts affected by the 'misc' partition drop that we can recount from the 'text' partition.TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: gerritbot, Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating

2018-09-24 Thread mpopov
mpopov added a comment.

In T204415#4611729, @Nuria wrote:
Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting  abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java


BTW query has to filter by path anyway because it also counts WDQS homepage visits so we're not switching to tags in this case.

F26189240: Screen Shot 2018-09-24 at 3.58.12 PM.pngTASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T204415: Query stats dashboard not updating

2018-09-24 Thread mpopov
mpopov added a subscriber: Gehel.mpopov added a comment.
Thanks for looking into it, @Nuria! And for confirming, @elukey @Ottomata! :)

A note for #operations: this is not the first time we've encountered an issue like this. Last year our query for Maps usage stopped working because of partition changes that we weren't told of (T167083), and this is exactly like that. Nobody on #product-analytics is subscribed to ops@lists.wikimedia (because 99.999% of those threads would be irrelevant to us), so I just want to point out that the decisions made by Ops that affect data sources like wmf.webrequest table need to be communicated to analysts who rely on those data sources.

I don't think it's reasonable to expect, say, @Gehel to notice those emails in his mailbox and notify us, so I suggest that when authoring emails announcing big, data source-related changes like partition drops and renames, please cc product-analyt...@wikimedia.org since we have scripts and queries that operate on those data sources under certain hard-coded assumptions.TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Gehel, Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T177358: Metrics for SDoC: translations

2018-04-23 Thread mpopov
mpopov closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: PDrouin-WMF, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, Gq86, E1presidente, Cparle, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T174519: [epic] SDoC: Determine baseline for metrics

2018-04-23 Thread mpopov
mpopov closed subtask T177358: Metrics for SDoC: translations as "Resolved".Herald added a project: Product-Analytics.
TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Nuria, Capt_Swing, Ramsey-WMF, SandraF_WMF, Abit, chelsyx, mpopov, debt, Aklapper, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, LawExplorer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T177358: Metrics for SDoC: translations

2017-12-13 Thread mpopov
mpopov moved this task from In progress to Needs review on the Discovery-Analysis (Current work) board.mpopov added a comment.
Search query language breakdown note & results at https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177358-2TASK DETAILhttps://phabricator.wikimedia.org/T177358WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T177358: Metrics for SDoC: translations

2017-12-13 Thread mpopov
mpopov updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...** [x] How many search queries happen in what languages?...TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T177358: Metrics for SDoC: translations

2017-12-07 Thread mpopov
mpopov claimed this task.mpopov set the point value for this task to "8".
TASK DETAILhttps://phabricator.wikimedia.org/T177358EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T177357: Metrics for SDoC: future work of interest (templates and licensing)

2017-11-21 Thread mpopov
mpopov moved this task from Current work to Up Next on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis; removed Discovery-Analysis (Current work).
TASK DETAILhttps://phabricator.wikimedia.org/T177357WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T177357: Metrics for SDoC: future work of interest (templates and licensing)

2017-11-14 Thread mpopov
mpopov moved this task from Needs triage to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177357WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions

2017-10-13 Thread mpopov
mpopov added a comment.
@chelsyx do you wanna add your stuff to https://github.com/wikimedia-research/SDoC-Initial-Metrics ?TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T177356: Metrics for SDoC: look at querying databases

2017-10-13 Thread mpopov
mpopov moved this task from In progress to Done on the Discovery-Analysis (Current work) board.mpopov added a comment.
Queries & data uploaded to https://github.com/wikimedia-research/SDoC-Initial-Metrics

Moving this into 'Done' as I don't think there's anything left to do on this one.TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases

2017-10-13 Thread mpopov
mpopov updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...** [x] How many people are involved in flagging for deletion/deleting files
TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases

2017-10-13 Thread mpopov
mpopov added a comment.
Growth of number of deleters over time:

F10188497: cumulative_deleters.png

How many users deleted N-many files:

F10188503: deleter_activity.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases

2017-10-13 Thread mpopov
mpopov added a comment.
Total files uploaded to Commons (as of right now) by extension:


mediaextensionuploads
audioogg773305
audiooga6180
audioflac6140
audiomid4993
audiowav3512
audioopus410
docspdf354765
docsdjvu60524
imagejpg/jpeg36918799
imagepng2268026
imagesvg1176530
imagetif/tiff807921
imagegif153959
imagexcf1008
imagewebp95
videoogv66610
videowebm41161



Historical trends:

F10187336: monthly_uploads.png

F10187339: cumulative_uploads.png

Treemap (sans jpg/jpegs because holy moley there's 37M of those and that's more than all the others combined):

F10187334: treemap_uploads.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases

2017-10-13 Thread mpopov
mpopov updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...* [x] How  many: mpegs, pngs, ogg, etc...** [x] Track organic growth rate of uploads (historical trends)...TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...** [x] Average time to deletion?
* [] How many people are involved in flagging for deletion/deleting files
TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov added a comment.
Time-to-deletion:

F10150716: time-to-deletion.png


Most copyright-related deletions happen within 1 day of upload across almost all media types, with the exception of 'drawing' (SVGs)
A lot of audio files are deleted within 1 minute or 1 week of upload
Half of all images and PDFs deleted were deleted within 1 month of upload for non-copyright reasons
TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION...*** copyright violations (Use case: creation of auto-copyright violation tools)
 Use case: creation of auto-copyright violation tools*** [[ https://commons.wikimedia.org/wiki/Commons:OTRS | OTRS ]]
** [] ores 
** [] Average time to deletion?...TASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov added a comment.
Reasons for files deleted in 2017:

F10148687: deletion_reasons.pngTASK DETAILhttps://phabricator.wikimedia.org/T177356EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions

2017-10-11 Thread mpopov
mpopov added a comment.

In T177354#3676545, @chelsyx wrote:
Unfortunately, the mediawiki snapshot doesn't has the image table which describes images and other uploaded files.


Ah, yeah. I missed the reference to image in your query. But looks like we can use img_timestamp, although those queries will take some time.

Also something to note is that img_major_mime shows up as "application" for .ogg files (which are audio files) and .pdf files:

SELECT DISTINCT img_major_mime, img_minor_mime
FROM commonswiki.image;




img_major_mimeimg_minor_mime
imagegif
imagejpeg
imagepng
imagetiff
imagevnd.djvu
imagewebp
imagex-xcf
imagesvg+xml
applicationogg
audiomidi
audiowav
audiowebm
audiox-flac
videowebm
applicationpdf



I recommend adding a CASE that returns "audio" for ogg files and "document" (for example) for PDFs.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions

2017-10-11 Thread mpopov
mpopov added a comment.

In T177354#3675988, @debt wrote:
Hey @chelsyx - what time frame does this cover?


Jumping in to say this looks like it's from launch of Commons to now.

Can we also get a count of how this has changed over the last week and compare that to the last 30 days? It'd be interesting to see if the numbers are fairly consistent (individual vs institution) or if they have changed quite a bit when extending the time scope.

@chelsyx this may be useful: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits as it contains monthly snapshots of the page & user tables as of April 2017TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.mpopov set the point value for this task to "6".mpopov claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1241/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T177356: Metrics for SDoC: look at querying databases

2017-10-11 Thread mpopov
mpopov moved this task from Needs triage to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T177356WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T149963: Analyze WDQS traffic data to find parallel connection patterns

2016-11-30 Thread mpopov
mpopov added a comment.

How many IPs use parallel connections to the WDQS servers? Out of the IPs that do the above, how many have the same/different user agents (hinting at one tool or proxy serving multiple clients)?
Of 14K unique IPs observed between Nov 1st and 28th, 1.9K (13.6%) had made more than 1 request (to SPARQL endpoint) at any given second.
Of those, 1360 (71.1%) only had 1 UA; 553 (28.9%) had 2 or more UAs; with 2 IP addresses observed to have 30-33 UAs.


How many parallel connections are typically used, how frequent is to use more than 3, what is the max, etc.?
726 IPs (5.17%) were seen making 3 or more requests per second.
Of those, 458 (63.1%) only had 1 UA; 268 (36.9%) had 2 or more UAs.

537 IPs (3.82%) were seen making more than 3 requests per second.
Of those, 331 (61.64%) only had 1 UA; the rest had 2 or more UAs.


In general, how many user agents per IP we have - do we have some IPs that have a lot of different agents (indicating a proxy), how much and how traffic from those IPs looks like - e.g. how many parallel requests, how often theres more than one, more than three?
A particular Digital Ocean IP was especially active, using the axios promise based HTTP client
300+ requests made per second 7 different times
200-300 requests made per second 306 different times
100-300 requests made per second 735 different times

100-200 requests made per second by 2 Universidad Politecnica de Madrid IPs 2,200 different times
Some were made using a browser on a computer (according to the UA)
Some were made using Requests library for Python




@Smalyshev: Let me know if you have any additional questions and/or if I missed anything. Hope this helps!TASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T149963: Analyze WDQS traffic data to find parallel connection patterns

2016-11-30 Thread mpopov
mpopov added a comment.
@Smalyshev: still in the process of figuring out the parallel connection aspect but here are some minute-by-minute-over-24-hours graphs/stats you might be interested in that I made in the process of playing with the data

F4911654: sparql_median_2.png

F4911656: sparql_median.png

F4911660: sparql_users.png

F4911663: cumulative_sparql.pngTASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T149963: Analyze WDQS traffic data to find parallel connection patterns

2016-11-28 Thread mpopov
mpopov moved this task from Up Next to Current work on the Discovery-Analysis board.mpopov edited projects, added Discovery-Analysis (Current work); removed Discovery-Analysis.
TASK DETAILhttps://phabricator.wikimedia.org/T149963WORKBOARDhttps://phabricator.wikimedia.org/project/board/1850/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T149963: Analyze WDQS traffic data to find parallel connection patterns

2016-11-28 Thread mpopov
mpopov claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T149963EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: debt, Deskana, chelsyx, mpopov, Gehel, Aklapper, Smalyshev, EBjune, mschwarzer, Avner, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-09-27 Thread mpopov
mpopov added a comment.
Great job! Let's put it up on Commons! :)

Use the following licensing & categorization:

=={{int:license-header}}==
{{WMF-staff-upload|license=cc-by-sa-4.0}}
{{Wikimedia trademark}}

[[Category:Wikimedia Discovery]]
[[Category:Wiki Research]]TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-09-13 Thread mpopov
mpopov added a comment.
Reviewed copy with minor corrections & suggestions sent back to Chelsy.TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Addshore, Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-09-01 Thread mpopov
mpopov added a comment.
Reviewed; marked-up copy of the 1st draft sent back to Chelsy. Looking forward to 2nd draft :PTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-08-31 Thread mpopov
mpopov added a comment.
First draft looks good! I will try to review this as soon as I can :)TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-08-23 Thread mpopov
mpopov edited the task description. (Show Details)
EDIT DETAILS...* These articles on [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive | Hive ]] and [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries | Hive queries ]] are good resources. That second one uses [[ https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline | Beeline ]] interface which we've tried to migrate to once but it didn't work out, so [[ https://github.com/wikimedia/wikimedia-discovery-wmf/blob/master/R/hive.RR | wmf::query_hive() ]] still uses Hive. And [[ https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF | wmf::query_hive()here's a good reference ]] still usesof functions and operations built into HiveQLTASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T143762: WDQS: Geographic breakdown of SPARQL queries

2016-08-23 Thread mpopov
mpopov created this task.mpopov added projects: Discovery-Analysis (Current work), Epic, Wikidata-Query-Service.Herald added projects: Wikidata, Discovery.
TASK DESCRIPTIONBackground

In T112605, we performed a broad analysis of Wikidata Query Service users and queries. This was almost a year ago, and we're coming up on the first anniversary of WDQS' public launch (announced on Monday, 7 September 2015). The WDQS dashboard only tracks basic metrics like SPARQL usage, so we don't currently have an up-to-date picture of who WDQS users are and where they're from. But it would be nice to know how that picture looks these days! :)

Objective

In this task, you will perform an original analysis of web requests, focusing specifically on successful (HTTP status codes 200 & 304) web requests to the SPARQL endpoint (see golden/wdqs/basic_usage.R and lines 45-52 from that old report's analysis codebase for references). Your analysis should focus on the geographic and agent type breakdown of those queries. Which countries have users who use WDQS? What are the top countries by SPARQL queries? How does that breakdown look when you compare known automata vs not known automata? Are the patterns consistent day-to-day over the course of a week?

Produce a 1-2 page report of your findings. Once the report has been reviewed & OK'd by me, @debt, and @Smalyshev, please upload the PDF to Commons.

Tips & Links


You shouldn't need to import/use any refinery UDFs for this analysis; you'll do this in the next task :P
Study the refined webrequest schema
These articles on Hive and Hive queries are good resources. That second one uses Beeline interface which we've tried to migrate to once but it didn't work out, so wmf::query_hive() still uses Hive.
Remember not to include any PII like IP addresses in your report and do not upload the data if you end up making a GitHub repo like this one
After uploading the report to Commons, you'll need to copy over some of the licensing info from this report to yours
As always, don't hesitate to ask questions or to ask for help/clarification :D
TASK DETAILhttps://phabricator.wikimedia.org/T143762EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, Smalyshev, debt, mschwarzer, Avner, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T141135: "median" not working on WDQS dashboards

2016-08-08 Thread mpopov
mpopov claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T141135: "median" not working on WDQS dashboards

2016-08-08 Thread mpopov
mpopov edited projects, added Discovery-Analysis-Sprint; removed Discovery-Analysis-Backlog.
TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T141135: "median" not working on WDQS dashboards

2016-08-08 Thread mpopov
mpopov added a comment.
Done: http://discovery.wmflabs.org/wdqs/TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T141135: "median" not working on WDQS dashboards

2016-08-08 Thread mpopov
mpopov added a comment.
Forgot to tag this in https://gerrit.wikimedia.org/r/#/c/303582/TASK DETAILhttps://phabricator.wikimedia.org/T141135EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopovCc: mpopov, Aklapper, Smalyshev, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T111790: Improve Phabricator link on Wikidata Query Service dashboard

2015-09-08 Thread mpopov
mpopov added a comment.

They do not. Wikimedia repos on GitHub are simple mirrors of Gerrit. To the 
point where the version on GitHub says that OliverKeyes committed to it but 
there's no such user. The patch needs to be submitted to Gerrit.


TASK DETAIL
  https://phabricator.wikimedia.org/T111790

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: He7d3r, mpopov
Cc: Deskana, Ironholds, mpopov, Smalyshev, He7d3r, Aklapper, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, 01tonythomas



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-09-02 Thread mpopov
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard

2015-09-02 Thread mpopov
mpopov added a comment.

First version is live at http://searchdata.wmflabs.org/wdqs/

Will chat with Stas soon to clarify/fix any issues with the data/queries.

P.S. Also gave the Discovery Dashboards page a bit of a facelift 
http://searchdata.wmflabs.org/ :D


TASK DETAIL
  https://phabricator.wikimedia.org/T109361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-09-02 Thread mpopov
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-09-01 Thread mpopov
mpopov added a comment.

Script: https://gerrit.wikimedia.org/r/#/c/235137/1/data_retrieval/wdqs.R

Oliver added it to the scheduler and I ran it on the past 40 days to backfill 
the aggregate dataset that will be up-to-date going forward.


TASK DETAIL
  https://phabricator.wikimedia.org/T109360

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard

2015-09-01 Thread mpopov
mpopov added a comment.

Waiting for code review: https://gerrit.wikimedia.org/r/#/c/235365/


TASK DETAIL
  https://phabricator.wikimedia.org/T109361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-09-01 Thread mpopov
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard

2015-09-01 Thread mpopov
mpopov added a comment.

Awesome, thank you @EBernhardson :D


TASK DETAIL
  https://phabricator.wikimedia.org/T109361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: EBernhardson, mpopov, Ironholds, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata

2015-08-31 Thread mpopov
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T108732

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, 
JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-31 Thread mpopov
mpopov moved this task to Done on the Discovery-Analysis-Sprint workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard

2015-08-31 Thread mpopov
mpopov added a comment.

Erik is working on an issue with installing new R packages, especially ones 
that require version of R newer (e.g. 3.1.2) than what is currently installed 
(3.0.2).

The dashboard is live at http://searchdata.wmflabs.org/wdqs/ but is currently 
busted because of lack of dplyr.

Furthermore, the wdqs.R script for acquiring aggregated data needs to be 
scheduled for daily execution. @Ironholds, talk to me when you're ready to do 
schedule it and I'll run the script on the backlog to get the most up-to-date 
dataset.


TASK DETAIL
  https://phabricator.wikimedia.org/T109361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109361: Create a Wikidata query service usage dashboard

2015-08-28 Thread mpopov
mpopov added a comment.

Dedicated WDQS dashboard is sitting locally on my computer. Waiting for my 
request for project to be done so I can push the code out to Gerrit.


TASK DETAIL
  https://phabricator.wikimedia.org/T109361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-08-28 Thread mpopov
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-08-28 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata

2015-08-28 Thread mpopov
mpopov claimed this task.
mpopov set Story Points to 2.

TASK DETAIL
  https://phabricator.wikimedia.org/T108732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, 
JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata

2015-08-28 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T108732

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Ironholds_backup, Abraham, Christopher, Lydia_Pintscher, Ironholds, 
JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-27 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-08-27 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-27 Thread mpopov
mpopov moved this task to Needs review on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109361: Create a Wikidata query service usage dashboard

2015-08-27 Thread mpopov
mpopov moved this task to Needs review on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109361

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: mpopov, Ironholds, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Manybubbles, JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-26 Thread mpopov
mpopov moved this task to Backlog on the Discovery-Analysis-Sprint workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-25 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-25 Thread mpopov
mpopov moved this task to Backlog on the Discovery-Analysis-Sprint workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-21 Thread mpopov
mpopov moved this task to Stalled/Waiting on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-21 Thread mpopov
mpopov added a comment.

Need to transfer the logic from HiveQL query to UDF and then to run the script 
on previous days to fill in the backlog.


TASK DETAIL
  https://phabricator.wikimedia.org/T109360

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T109360: Create a script to extract request logs for query.wikidata.org for dashboards

2015-08-19 Thread mpopov
mpopov moved this task to In progress on the Discovery-Analysis-Sprint 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T109360

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1241/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T108732: Train Jan Zerebecki of Wikimedia Germany on how to set up a Shiny dashboard for Wikidata

2015-08-12 Thread mpopov
mpopov added a blocking task: T108094: As a project lead, I'd like 
documentation on how to set up a Shiny dashboard so that I can visualise the 
project's key performance indicators .

TASK DETAIL
  https://phabricator.wikimedia.org/T108732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov
Cc: Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, 
aude, Malyacko



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs