[Wikidata-bugs] [Maniphest] T334558: [Analytics] Unique user-agents accessing Wikidata's REST API

2023-06-28 Thread mforns
mforns added a comment.


  @AndrewTavis_WMDE Hi! I think you could go with simply `wmde`. The analytics 
prefix in product_analytics exists because the team is named like that. In your 
case, you could use `wmde` I think.
  BTW this is the task to create the WMDE Airflow instance: T340648 
<https://phabricator.wikimedia.org/T340648>

TASK DETAIL
  https://phabricator.wikimedia.org/T334558

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, mforns
Cc: mforns, xcollazo, Ottomata, lbowmaker, WMDE-leszek, AndrewTavis_WMDE, 
Michael, Manuel, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T314131: Some reliability metrics missing since June 20th '22

2022-10-28 Thread mforns
mforns added a comment.


  Yes, if we had implemented the DAG differently, re-running would be a task 
that Airflow users could easily do!
  However, this particular DAG (and a couple others) follow a pattern that 
makes it difficult to re-run partially.
  We plan to change those DAGs to a better structure and add the documentation 
to our Airflow developer guide 
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow/Developer_guide>.

TASK DETAIL
  https://phabricator.wikimedia.org/T314131

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Michael, Astuthiodit_1, EChetty, BTullis, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T314131: Some reliability metrics missing since June 20th '22

2022-10-27 Thread mforns
mforns added a comment.


  I've created a task to specifically tackle the back-filling: 
https://phabricator.wikimedia.org/T321838

TASK DETAIL
  https://phabricator.wikimedia.org/T314131

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Michael, Astuthiodit_1, EChetty, BTullis, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T314131: Some reliability metrics missing since June 20th '22

2022-10-26 Thread mforns
mforns added a comment.


  Hi @Michael! Yes, we will back-fill as much as we can.
  I have to talk to the team tomorrow to see how we want to approach that, 
since that particular Airflow DAG is not easy to re-run partially...
  I'll keep you posted!

TASK DETAIL
  https://phabricator.wikimedia.org/T314131

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Michael, Astuthiodit_1, EChetty, BTullis, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T314131: Some reliability metrics missing since June 20th '22

2022-09-07 Thread mforns
mforns added a comment.


  I've looked a bit into this and I think I found what's happening.
  Indeed the metrics query is able to gather the data correctly, but the 
metrics do not reach Graphite.
  The reason is the HiveToGraphite Spark job is failing when sending the 
metrics to Graphite, because the values of the metrics are doubles.
  
22/09/07 00:33:14 ERROR HiveToGraphite: java.lang.Double cannot be cast to 
java.lang.Long. Failed to send message to Graphite.
  
  HiveToGraphite expects that the metric values are longs, and not doubles.
  Although the queries do not explicitly specify the double type, my suspicion 
is that the `percentile_approx` calculation in some metrics outputs a double,
  
percentile_approx(time_firstbyte, 0.5) as metric_count,
  
  which after the `UNION` statement affects all the results (all the metric 
values become doubles since they share the same column). But maybe I'm wrong!
  In any case, we have to modify the query file to make sure the output values 
are compatible with the type `long`.
  
  ---
  
  PS: One question is, why did Spark finish with final app status `SUCCEEDED` 
when it recorded a `HiveToGraphite ERROR`??? We Data Engineering should look 
into that, as well...

TASK DETAIL
  https://phabricator.wikimedia.org/T314131

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Michael, Astuthiodit_1, EChetty, BTullis, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T314131: Some reliability metrics missing since June 20th '22

2022-09-07 Thread mforns
mforns claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T314131

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Michael, Astuthiodit_1, EChetty, BTullis, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2022-08-17 Thread mforns
mforns added a comment.


    

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: phuedx, mforns
Cc: mforns, phuedx, Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, 
Aklapper, Michael, Hellket777, LisafBia6531, Astuthiodit_1, ntsako, 786, 
Biggs657, karapayneWMDE, Invadibot, Universal_Omega, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, darthmon_wmde, 
Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, GWicke, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2022-08-11 Thread mforns
mforns added a comment.


  Thanks a lot @phuedx!

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: phuedx, Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, 
Michael, Hellket777, LisafBia6531, Astuthiodit_1, ntsako, 786, BTullis, 
Biggs657, karapayneWMDE, Invadibot, Universal_Omega, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, darthmon_wmde, 
Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, GWicke, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2022-07-15 Thread mforns
mforns updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: phuedx, Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, 
Michael, Hellket777, Astuthiodit_1, ntsako, 786, BTullis, Biggs657, 
karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, holger.knust, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, 
GWicke, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2022-07-15 Thread mforns
mforns updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: phuedx, Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, 
Michael, Hellket777, Astuthiodit_1, ntsako, 786, BTullis, Biggs657, 
karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, holger.knust, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, 
GWicke, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2022-07-11 Thread mforns
mforns updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: phuedx, Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, 
Michael, Hellket777, Astuthiodit_1, ntsako, 786, BTullis, Biggs657, 
karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, holger.knust, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, 
GWicke, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299059: Write an Airflow job converting commons structured data dump to Hive

2022-02-11 Thread mforns
mforns added a project: Airflow.

TASK DETAIL
  https://phabricator.wikimedia.org/T299059

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Snwachukwu, mforns
Cc: Cparle, nettrom_WMF, Miriam, Nuria, cchen, AKhatun_WMF, JAllemandou, 
ntsako, EChetty, toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2021-10-22 Thread mforns
mforns added projects: Data-Engineering, Data-Engineering-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, Michael, 
EChetty, Invadibot, maantietaja, Akuckartz, 4748kitoko, darthmon_wmde, 
holger.knust, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, JAllemandou, terrrydactyl, 
Wikidata-bugs, aude, GWicke, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2021-09-14 Thread mforns
mforns updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, Michael, 
Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, 4748kitoko, Hook696, darthmon_wmde, Kent7301, 
holger.knust, joker88john, CucyNoiD, Nandana, Akovalyov, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
JAllemandou, terrrydactyl, Wikidata-bugs, aude, GWicke, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2021-09-14 Thread mforns
mforns updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, Michael, 
Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, 4748kitoko, Hook696, darthmon_wmde, Kent7301, 
holger.knust, joker88john, CucyNoiD, Nandana, Akovalyov, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
JAllemandou, terrrydactyl, Wikidata-bugs, aude, GWicke, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy

2021-09-13 Thread mforns
mforns added a comment.


  @Michael Hi!
  
  I'm going to migrate this schema during the next couple weeks.
  I need to askk you a couple questions about it.
  
  1. Do you need to collect IP or geocode information together with this 
schema? The legacy EventLogging system collects them by default. But in the new 
system we only collect them if necessary. Please, let me know!
  2. Is the instrumentation that generates this data in the front-end (JS)? Or 
is it in the back end (PHP)?
  
  Cheers!

TASK DETAIL
  https://phabricator.wikimedia.org/T290303

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Manuel, Addshore, Ottomata, awight, Lydia_Pintscher, Aklapper, Michael, 
Invadibot, maantietaja, Akuckartz, 4748kitoko, darthmon_wmde, holger.knust, 
Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, JAllemandou, terrrydactyl, Wikidata-bugs, 
aude, GWicke, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T262942: PoC on anomaly detection with Flink

2020-09-17 Thread mforns
mforns edited projects, added Analytics-Radar; removed Analytics.

TASK DETAIL
  https://phabricator.wikimedia.org/T262942

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Aklapper, Ottomata, Gehel, dcausse, CDanis, Zbyszko, CBogen, Akuckartz, 
4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, JAllemandou, 
terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-03-16 Thread mforns
mforns added a comment.


  We've deployed the patch, now.
  It has already started to crunch data starting at 2020-01-01.
  It will take a couple hours to backfill up to today.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, mforns
Cc: mforns, Milimetric, Ladsgroup, Nuria, JAllemandou, elukey, Addshore, 
Aklapper, Lydia_Pintscher, Alter-paule, Hazizibinmahdi, Beast1978, Un1tY, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, E.S.A-Sheild, Iflorez, 
darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-05 Thread mforns
mforns closed subtask T239127: Import slots/slots_roles  and  
wikibase.wbc_entity_usage through scoop  as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T238878

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: Milimetric, Cparle, nettrom_WMF, Ladsgroup, daniel, Mayakp.wiki, gsingers, 
matthiasmullie, Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, 
QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Triaged] T239565: Create reportupdater reports that execute SDC requests

2019-12-05 Thread mforns
mforns triaged this task as "High" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

WORKBOARD
  https://phabricator.wikimedia.org/project/board/11/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, mforns
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, 
QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-04-22 Thread mforns
mforns added a comment.


  @diego Hi! Is there anythin additional for us Analytics here? Thaanks

TASK DETAIL
  https://phabricator.wikimedia.org/T215616

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: mforns, Marostegui, Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, 
JAllemandou, diego, alaa_wmde, Nandana, Akovalyov, Banyek, Rayssa-, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T127467: Finding items on Wikidata that should be merged

2018-05-23 Thread mforns
mforns added a comment.
@MichaelSchoenitzer_WMDE
Oh, cool. Yea, definitely useful. Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T127467EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mfornsCc: mforns, Lahi, MichaelSchoenitzer_WMDE, Ladsgroup, Esc3300, Liuxinyu970226, matej_suchanek, Bugreporter, Ricordisamoa, Aklapper, StudiesWorld, Lydia_Pintscher, samuwmde, Gq86, Vacio, GoranSMilovanovic, QZanden, LawExplorer, Culex, Wikidata-bugs, aude, Alchimista, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T127467: Finding items on Wikidata that should be merged

2018-05-23 Thread mforns
mforns added a comment.
Hey!

As @Ladsgroup knows, I worked on this task during the BCN Hackathon.
It was super-interesting and I learned a lot about Wikidata :]
Thanks for the opportunity!
Here's a summary about what I did, issues I had, and next steps:


After a while of reading docs and understanding basic stuff, I wrote a small bash script to extract Wikidata items from the dump in /mnt/data/xmldatadumps/public/wikidatawiki/entities/20180514/wikidata-20180514-all.json.gz, abridge its contents limiting them to: id, type, labels and sitelinks. And finally split them in 1M-lines files, to be processed in hdfs/hadoop in a distributed way. The script is:


nice -n19 ionice -c2 -n7 sh -c "zcat /mnt/data/xmldatadumps/public/wikidatawiki/entities/20180514/wikidata-20180514-all.json.gz | head -n -1 | tail -n +2 | sed 's/,$//' | jq -c 'select(.type == \"item\") | {id, labels: .labels | [keys[] as \$k | [\$k, .[\$k].value]], sitelinks: .sitelinks | [keys[] as \$k | [\$k, .[\$k].title]]}' | split -l 100 - ~/wikidata_items_abridged_20180514/part_"


Then, I compressed each file separately (hadoop can only distribute computation for compressed files, if they are compressed separately) and moved those to hdfs: /user/mforns/wikidata_items_abridged_20180514. Actually, I only moved 5 of the 49 files, to avoid computation of the whole data set while developing. But the rest are ready in stat1005:/home/mforns/wikidata_items_abridged_20180514 and can be copied over there any time.
I also wrote a spark/scala script that reads the item files in hdfs and processes them to find duplicate candidates. The logic identifies items that have identic labels for at least one language, or that have identic sitelinks for at least one site. Labels or sitelinks of different languages/sites are not compared. As this is executed in the cluster using spark RDDs (resilient distributed datasets), the algorithm can compare all Wikidata items against themselves and output a graph, where the vertices are item IDs (Q12345) and edges mean two vertices have identic labels/sitelinks. The weight of the edge corresponds to the number of matches in labels/sitelinks between both vertices (items). Here's the code:


import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession

type Item = (String, Map[String, String], Map[String, String])

def parseItems(
sourceDirectory: String,
spark: SparkSession
): RDD[Item] = {
val schema = StructType(Seq(
StructField("id", StringType, nullable = false),
StructField("type", StringType, nullable = false),
StructField("labels", ArrayType(ArrayType(StringType)), nullable = false),
StructField("sitelinks", ArrayType(ArrayType(StringType)), nullable = false)
))
val items = spark.read.schema(schema).json(sourceDirectory + "/*").rdd
items.map(r => (
r.getString(0),
r.getSeq(2).asInstanceOf[Seq[Any]].map(e => e.asInstanceOf[Seq[String]]).map(e => e(0) -> e(1)).toMap,
r.getSeq(3).asInstanceOf[Seq[Any]].map(e => e.asInstanceOf[Seq[String]]).map(e => e(0) -> e(1)).toMap
))
}

val items = parseItems("/user/mforns/wikidata_items_abridged_20180514", spark)

val expressions = items.flatMap { item =>
(
item._2.map(label => (label._1, label._2, item._1)) ++
item._3.map(sitelink => (sitelink._1, sitelink._2, item._1))
).filter(e => e._2.size > 2)
}

val expressionGroups = (expressions
.keyBy(e => (e._1, e._2))
.groupByKey
.map(g => (g._1, g._2.map(_._3).toSeq.sortBy(id => id)))
.filter(g => g._2.size > 1))

val explodedEdges = expressionGroups.flatMap(g => g._2.combinations(2))

val weightedEdges = explodedEdges.keyBy(e => e).groupByKey.map(g => (g._1, g._2.size))

val edges = weightedEdges.filter(e => e._2 > 1)

edges.map(e => e._1(0) + "\t" + e._1(1) + "\t" + e._2).saveAsTextFile("/user/mforns/duplicate_candidates")

The output looks like this (you can access it in hdfs under /user/mforns/duplicate_candidates):

Q7545947	Q7545948	4
Q2581746	Q3779054	2
Q32850943	Q32851055	2
Q32498252	Q804060	2
Q4451724	Q4451776	5
...

Finally, I wrote a python script to read that output on a single machine and calculate the graph's connected components. I haven't tested it, but here it is:

import networkx as nx
import sys

G = nx.Graph()

with open(sys.argv[1], 'r') as input_file:
for line in input_file:
v1, v2, w = line.split(' ')
G.add_edge(v1, v2, weight=w)

for component in nx.connected_components(G):
print(component)

This should return all groups of items that are likely to be duplicates (same-label/sitelink duplicates, that is).

Issues

If you look at the duplicate_candidates files, you can quickly identify false positives. I found 2 types:


Disambiguation pages: Th

[Wikidata-bugs] [Maniphest] [Updated] T191022: Add Wikidata website extract oozie job

2018-05-07 Thread mforns
mforns set the point value for this task to "8".
TASK DETAILhttps://phabricator.wikimedia.org/T191022EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, mfornsCc: Smalyshev, Nuria, gerritbot, JAllemandou, Jonas, Aklapper, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, lisong, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Triaged] T187296: Increase kafka event retention to 14 or 21 days

2018-04-16 Thread mforns
mforns triaged this task as "Low" priority.
TASK DETAILhttps://phabricator.wikimedia.org/T187296EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mfornsCc: mforns, elukey, Ottomata, Aklapper, Nuria, Ladsgroup, Pchelolo, JAllemandou, Smalyshev, Lahi, Gq86, Vacio, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, Jonas, FloNight, Xmlizer, Nirmos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T187296: Increase kafka event retention to 14 or 21 days

2018-04-16 Thread mforns
mforns added a comment.
We'll have this on our radar, until things are stable.TASK DETAILhttps://phabricator.wikimedia.org/T187296EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mfornsCc: mforns, elukey, Ottomata, Aklapper, Nuria, Ladsgroup, Pchelolo, JAllemandou, Smalyshev, Lahi, Gq86, Vacio, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, Jonas, FloNight, Xmlizer, Nirmos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-21 Thread mforns
mforns added a comment.
@Nuria @Smalyshev

So probably if we round timestamp and remove sessionId your proposal for dattaset #1 is safe to keep long term (cc @mforns for anything I might be missing)

I think it depends highly on how drastically we sanitize the potentially identifying fields (user agent and client IP) and the fields that can indicate user acivity/features (query, location).
Intuitively it seems to me that we can keep this data in a private store indefinitely if sanitized. But having those sensitive 4 fields in the same data set will make it difficult to publicize, even if sanitized. I don't know how frequent are WDQS queries, but I imagine they are several orders of magnitude smaller than pageviews. Thus the buckets of this data set are likely to be sparse and small, which increases the threat to user privacy.

If we wanted to make this public, I'd go for removing the geographic location field entirely, and probably for daily or monthly resolution instead of hourly (depending on bucket size).
Also, splitting the data set in several unrelatable thematic data sets could help: queries by country, queries by user agent, session queries, etc.

Sorry if I'm too pessimistic, I'm not familiar with the kind of information that WDQS queries can give away about users.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mfornsCc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T134426: Review shared data namespace (tabular data) implementation

2016-05-09 Thread mforns
mforns moved this task from Incoming to Radar on the Analytics board.

TASK DETAIL
  https://phabricator.wikimedia.org/T134426

WORKBOARD
  https://phabricator.wikimedia.org/project/board/11/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Yurik, mforns
Cc: Danny_B, DannyH, StudiesWorld, Steinsplitter, Aklapper, Lydia_Pintscher, 
ekkis, Matanya, MarkTraceur, JEumerus, Thryduulf, Milimetric, MZMcBride, 
Bawolff, -jem-, gerritbot, Pokefan95, TerraCodes, intracer, ThurnerRupert, 
brion, Jdforrester-WMF, Eloy, TheDJ, Yurik, Zppix, Riley_Huntley, D3r1ck01, 
Izno, Luke081515, JAllemandou, Wikidata-bugs, aude, El_Grafo, Ricordisamoa, 
Shizhao, fbstj, Fabrice_Florin, Mbch331, Jay8g, Krenair, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs