[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread JAllemandou
JAllemandou added a comment.


  No objection :) I'd have gone for option 1 as it seems the easiest to 
maintain, but I agree, it means installing some stuff to the blazegraph 
machines.

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread JAllemandou
JAllemandou added a comment.


  I would suggest using the `hdfs-rsync` tool to do this - it requires some 
setting up with puppet, but it is helpful, through copying only new stuff from 
folders (see 
https://github.com/wikimedia/operations-puppet/blob/1c4d67ff19372832484f7551dc49836be5806024/modules/hdfs_tools/manifests/hdfs_rsync_job.pp
 and 
https://github.com/wikimedia/operations-puppet/blob/1c4d67ff19372832484f7551dc49836be5806024/modules/dumps/manifests/web/fetches/stats.pp)

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T336361: [Analytics] Identify access from mobile vs. desktop devices

2023-09-07 Thread JAllemandou
JAllemandou added a comment.


  > However, my assumption is that when only filtering for agent_type != 
'spider' the population will still include a lot of non-UI hits.
  
  The `agent_type` field currently can take 3 values: `spider`, `automated` and 
`user`. The `spider` one is used when user-agents self define themselves as 
bots, the `automated` one is used when we heuristically define the traffic as 
being automatically generated (big volume), and the rest falls under the `user` 
value. There indeed still is some non-user traffic being flagged as `user`.

TASK DETAIL
  https://phabricator.wikimedia.org/T336361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: JAllemandou, AndrewTavis_WMDE, Michael, Manuel, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-18 Thread JAllemandou
JAllemandou added a comment.


  In T342416#9101868 <https://phabricator.wikimedia.org/T342416#9101868>, 
@EBernhardson wrote:
  
  > These are both generated by spark.  The rdf is being imported by a scala 
application while the cirrus dump is imported by pyspark, but they should both 
be using the same underlying implementation. Both applications use 
`df.write.insertInto(table_name)` to instruct spark to do the actual output. 
I'm a bit surprised they end up generating different sets of permissions.
  >
  > I suppose it's not super important why the cirrus dump is world readable, 
it's fine to be readable, it just hints to me that there is something I don't 
understand about hdfs/spark/permissions happening here.
  
  Mwarf, wrong guess :) Interesting nonetheless - Let me know if you wish we 
pair on this.

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-18 Thread JAllemandou
JAllemandou added a comment.


  In T342416#9091146 <https://phabricator.wikimedia.org/T342416#9091146>, 
@EBernhardson wrote:
  
  > I looked into these, the attached patch should fix it but it leaves an open 
question (@JAllemandou):
  >
  > The `core-site.xml`, along with puppet which writes it out, has the default 
umask of 027 since at least 2021, which prevents world readability. So why do 
we have the following permissions for historical dumps:
  >
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230710
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230716
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230717
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230723
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230724
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230730
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230731
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230806
  
  The world-readable change were manually made by myself to unblock 
@AndrewTavis_WMDE  - I logged my change in the analytics IRC chan but didn't 
ping on the search IRC chan - I should have, please excuse me on this :)
  
  > Similarly we have other jobs that still run today and emit world readable 
dumps without explicitly setting the umask, what is causing the difference?
  >
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230716
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230723
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230730
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230806
  
  The guess I have about those would be that they are still generated by a Hive 
job. Hive and spark behave differently in regard to permissions when generating 
files. Spark uses the configured umask, while hive reproduces the parent-dir 
patten. I'd be interested to be sure if my guess is correct :)

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-07-03 Thread JAllemandou
JAllemandou added a comment.


  We met this morning with @AndrewTavis_WMDE and @Manuel - Thank you folks for 
the great meeting.
  The detailed Meeting notes are here: 
https://docs.google.com/document/d/1REsolXnZf2KqApL0p-DE8X4eWXI_zxHgrCe3k1hcZnw
  
  From the job list in previous comment:
  
  - 4 don't run spark andare kept as-is: `WMDE_BannerImpressions`, 
`Wiktionary_CognateDashboard`, `2021_WMDE_Mitmachen_Bereich_2021_Campaign`, 
`WDCM_Sqoop_Clients`)
  - 3 are stopped (crontaab commented): `Qurator_CuriousFacts`, 
`WDCM_EngineBiases`, `WD_PageviewsPerType`
  - 3 have been updated to run spark2 in fixed-resource mode, thus normally not 
failing after the migration to the spark3-shuffler: `WD_UsageCoverage`, 
`WD_languagesLandscape`, `NewEditors_comprehensive_report`
  
  With those changes there is no more blocker in migrating to the 
spark3-shuffler from this task :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-30 Thread JAllemandou
JAllemandou added a comment.


  Hi @AndrewTavis_WMDE,
  I've done some investigation, and here is what I have: Goran has 11 CRON jobs 
running from various hosts on our system (1on `stat1004`, 2 on `stat1007`, 7 on 
`stat1008`).
  
  - `WDCM_Sqoop_Clients` runs on`stat1004` weekly - It doesn't run spark (but 
Sqoop)
  - `2021_WMDE_Mitmachen_Bereich_2021_Campaign` runs on `stat1007` daily -  It 
doesn't run spark (but Hive)
  - `WD_PageviewsPerType` runs on `stat1007` daily but has been failing since 
February 17th - It runs a spark job
  - `WD_UsageCoverage` runs on `stat1008` daily - It runs a spark job
  - `WD_languagesLandscape` runs on `stat1008` monthly (30th of the month) - It 
runs a spark job
  - `Wiktionary_CognateDashboard` runs on `stat1008` daily - It doesn't run 
spark
  - `WDCM_EngineBiases` runs on `stat1008` weekly - It runs a spark job
  - `Qurator_CuriousFacts` runs on `stat1008` monthly (10th of the month) - It 
runs a spark job
  - `WMDE_BannerImpressions` runs on `stat1008` hourly - It doesn't runspark 
(but Hive)
  - `NewEditors_comprehensive_report` runs on `stat1008` daily - It runs a 
spark job
  
  We need to meet and talk about your usage of the data generated by those 
scripts, and see what you wish us to try to make work versus stop.
  I'm booking some time on your calendar next Monday :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-22 Thread JAllemandou
JAllemandou added a comment.


  In T334951#8952583 <https://phabricator.wikimedia.org/T334951#8952583>, 
@AndrewTavis_WMDE wrote:
  
  > - If the answer to the above question of permanently losing some data 
that's being produced by Concepts Monitor and other WMDE jobs is no, then we're 
ok with option one above of stopping the job.
  
  I am not knowledgeable at all about the data generated by the  job 
unfortunately, preventing me to assess whether there is data generated by the 
job that we would not be able to regenerate.
  Also, I have not been told about intermediary data stored on the cluster, 
making me think that all the data generated by the job is small enough to be 
saved for the reports only.
  But as stated befoe, those are  uninformed ideas :(
  
  > - Aside from this we'd prefer option two of configuring it to use 
fixed-resource.
  
  We can test that :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-19 Thread JAllemandou
JAllemandou added a comment.


  In T334951#8946790 <https://phabricator.wikimedia.org/T334951#8946790>, 
@AndrewTavis_WMDE wrote:
  
  > I'll async with him now and see if we can come to a decision sooner than 
that, but you all will have the answer by Wednesday at the latest 😊
  
  Awesome, thank you :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-19 Thread JAllemandou
JAllemandou added a comment.


  Hi Folks - What is the status on this one?
  
  I'd like Data-Engineering to announce the deprecation of Spark2 for this end 
of month, but not without knowing how we plan on tackling your job :)
  Here are the 2 possible solutions I can think of:
  
  - Stopping the job while it is revamped to spark3 (Knowing that the dashboard 
is broken, is it a possible solution?)
  - Configure the job not to use DynamicAllocation but to use fixed-resource, 
making the job work in spark2 despite spark2 being deprecated, but using more 
cluster resources than really needed
  - Postpone deprecating spark2 (if we could not do that, I'd be super happy :)
  
  Let me know your thoughts :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-09-14 Thread JAllemandou
JAllemandou added a comment.


  In T303831#8237323 <https://phabricator.wikimedia.org/T303831#8237323>, 
@EBernhardson wrote:
  
  > data cleanup looks to now have run successfully
  
  Thanks a lot @EBernhardson for finalizing on this :)

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Jersione, 
Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, 
karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, 
Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-08-26 Thread JAllemandou
JAllemandou added a comment.


  In T303831#8175252 <https://phabricator.wikimedia.org/T303831#8175252>, 
@EBernhardson wrote:
  
  > @JAllemandou  The one remaining piece of this ticket is cleaning up the 
historical data, per T303831#8081172 
<https://phabricator.wikimedia.org/T303831#8081172>.  Any suggestions on how we 
should manage droping old data from tables partitioned by a snapshot column?
  
  The we currently do this is with this script: 
https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots
  it works differently from the generic `refinery-drop-older-than` script, in 
that it lists all the datasets to clean and then applies the deletion.
  It's possible to add the datasets you need to delete in there, it shouldn't 
be complicated.

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, 
karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, 
Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-13 Thread JAllemandou
JAllemandou added a comment.


  Thanks a lot @EBernhardson for the help on finishing this!

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2022-04-08 Thread JAllemandou
JAllemandou closed subtask T299059: Write an Airflow job converting commons 
structured data dump to Hive as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 
Fernandobacasegua34, Astuthiodit_1, ntsako, 786, EChetty, Suran38, Biggs657, 
karapayneWMDE, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299059: Write an Airflow job converting commons structured data dump to Hive

2022-04-08 Thread JAllemandou
JAllemandou closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T299059

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Snwachukwu, JAllemandou
Cc: Cparle, nettrom_WMF, Miriam, Nuria, cchen, AKhatun_WMF, JAllemandou, 
Astuthiodit_1, ntsako, EChetty, karapayneWMDE, toberto, ldelench_wmf, 
Invadibot, MPhamWMF, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2022-03-30 Thread JAllemandou
JAllemandou closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 
Fernandobacasegua34, Astuthiodit_1, ntsako, 786, EChetty, Suran38, Biggs657, 
karapayneWMDE, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T252443: Create dashboard to show growth of structured data on Commons over time

2022-03-30 Thread JAllemandou
JAllemandou closed subtask T258834: Create a Commons equivalent of the 
wikidata_entity table in the Data Lake as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T252443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: cchen, JAllemandou
Cc: kzimmerman, nettrom_WMF, GFontenelle_WMF, Abit, Ramsey-WMF, CBogen, 
Astuthiodit_1, EChetty, karapayneWMDE, toberto, ldelench_wmf, Invadibot, 
maantietaja, Y.ssk, FRomeo_WMF, Muchiri124, ItamarWMDE, Nintendofan885, 
Akuckartz, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Cparle, SandraF_WMF, 
GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, 
_jensen, rosalieper, 4nn1l2, Taiwania_Justo, Scott_WUaS, Susannaanas, 
Ixocactus, Wong128hk, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, 
El_Grafo, Dinoguy1000, Ricordisamoa, Wesalius, Lydia_Pintscher, Raymond, 
Steinsplitter, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-03-07 Thread JAllemandou
JAllemandou added a comment.


  Thank you for letting us know :)

TASK DETAIL
  https://phabricator.wikimedia.org/T300240

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, karapayneWMDE, 
Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299059: Write an Airflow job converting commons structured data dump to Hive

2022-01-12 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Product-Analytics, Structured-Data-Backlog, 
Wikidata-Query-Service, Wikidata, Data-Engineering, Discovery-Search (Current 
work), Patch-For-Review, Data-Engineering-Kanban.
Restricted Application removed a project: Patch-For-Review.

TASK DESCRIPTION
  The airflow job should
  
  - be run weekly on Mondays.
  - wait for source data to be available:
- source folder is of form 
`hdfs://analytics-hadoop/wmf/data/raw/commons/dumps/mediainfo-json/MMDD`
- source folder contains a file named `_IMPORTED` when the source data has 
been succesfully imported in the folder
  - run a spark job reading the source data and writing it to hive
- the spark job is in the `refinery-job.jar` archive, we need to have it as 
a dependency for the job
- the spark job class is 
`org.wikimedia.analytics.refinery.job.structureddata.jsonparse.JsonDumpConverter`
- main parameters of the job are the input folder, the output hive table 
and the snapshot (time partition) being created. The output hive table will be 
`structured_data.commons_entity` and the `snapshot` will be in the form 
`-MM-DD`. See the class for the detailed list of parameters :)

TASK DETAIL
  https://phabricator.wikimedia.org/T299059

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: nettrom_WMF, Miriam, Nuria, cchen, AKhatun_WMF, JAllemandou, ntsako, 
EChetty, toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-12-16 Thread JAllemandou
JAllemandou added a comment.


  Code is ready:
  
  - Import `commons-mediainfo` json dumps to HDFS 
(https://gerrit.wikimedia.org/r/738874)
  - Update spark transformation job to work with both wikidata and commons 
dumps (https://gerrit.wikimedia.org/r/739129)
  - Update `wikidata_entity` table creation script and oozie job for the new 
fields added by the patch above 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/+/740589)
  - Add `commons_entoty` table creation script 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/+/740590)
  - Update spark transformation job to write directly to a hive table instead 
of to files 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/747508/)
  
  What we need after having merged/deployed the above is:
  
  - A new airflow job for the `commons_entity` data genration
  - A migration of the `wikidata_entity` oozie job to Airflow

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 786, EChetty, 
Suran38, Biggs657, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
4748kitoko, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Akovalyov, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-12-15 Thread JAllemandou
JAllemandou added a project: Data-Engineering-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 786, EChetty, 
Suran38, Biggs657, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
4748kitoko, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Akovalyov, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-09-16 Thread JAllemandou
JAllemandou updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, Jmixter87, JAllemandou, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31

2021-09-16 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  It is interesting to understand how properties are used by different content 
subgraphs (for instance humans, scholarly articles etc). It would allow us to 
better understand how properties used in a certain query context can be 
affected performance-wise by their usage in other contexts. For instance, the 
`main-topic` property when used for books could suffer from the property being 
widely used for scholarly-articles (a huge subgraph).
  This analysis would use the P31 <https://phabricator.wikimedia.org/P31> 
values of items to try to cluster items into groups (maybe we could even be 
better in using P279 <https://phabricator.wikimedia.org/P279>?), and we would 
count property usage by group to do further analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T291205

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, Jmixter87, JAllemandou, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-19 Thread JAllemandou
JAllemandou added a comment.


  Why not adding other prefixes if it's as simple as adding the prefix to the 
AQS list - I think there'll be more gotchas.
  let's try @AKhatun_WMF  :)

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, MPhamWMF, Lucas_Werkmeister_WMDE, Esc3300, dcausse, Aklapper, 
AKhatun_WMF, JAllemandou, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-07-19 Thread JAllemandou
JAllemandou closed this task as "Resolved".
JAllemandou added a comment.


  The analysis is documented here: 
https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Basic_Analysis.
  Thanks @AKhatun_WMF :)

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-07-19 Thread JAllemandou
JAllemandou added subscribers: MPhamWMF, Gehel.
JAllemandou added a comment.


  Thanks @AKhatun_WMF for the analysis.
  @dcausse , @Gehel and @MPhamWMF  - Do you think it;s worth trying to make our 
parser being able to process queries with the 'mwapi' prefix (it represents 10% 
of all requests) - otherwise this task can be closed.

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, MPhamWMF, Lucas_Werkmeister_WMDE, Esc3300, dcausse, Aklapper, 
AKhatun_WMF, JAllemandou, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-06-24 Thread JAllemandou
JAllemandou added a subtask: T285465: Document and analyze the number of 
parsing errors for parsed WDQS queries.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-06-24 Thread JAllemandou
JAllemandou added a parent task: T280640: Refine WDQS queries analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, AKhatun_WMF, JAllemandou, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries

2021-06-24 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  We wish, for the month of June 2021:
  
  - Report the number of parsing errors when generating parsed queries 
information
  - Provide information about why parsing errors happen

TASK DETAIL
  https://phabricator.wikimedia.org/T285465

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, AKhatun_WMF, JAllemandou, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries

2021-05-25 Thread JAllemandou
JAllemandou added a comment.


  The problem I see with using a generic class in the `QueryElem` object is the 
conversion to parquet. I don't think it'll work out of the box, leading to 
having to devise our own conversion. Let's brainstorm on ideas on this, 
possibly in meeting to make it faster :)

TASK DETAIL
  https://phabricator.wikimedia.org/T283256

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283258: Provide a job regularly deleting wdqs processed query after 90 days

2021-05-20 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Wikidata-Query-Service, Wikidata, Patch-For-Review, 
Discovery-Search (Current work).
Restricted Application removed a project: Patch-For-Review.

TASK DESCRIPTION
  This task is related to T273854 <https://phabricator.wikimedia.org/T273854>. 
When the job generating hourly query-info is launched, we should make sure we 
also delete the data after 90 days to be within our data-retention policy.

TASK DETAIL
  https://phabricator.wikimedia.org/T283258

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-05-20 Thread JAllemandou
JAllemandou updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-05-20 Thread JAllemandou
JAllemandou added a parent task: T280640: Refine WDQS queries analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-05-20 Thread JAllemandou
JAllemandou removed JAllemandou as the assignee of this task.
JAllemandou added a subscriber: dcausse.
JAllemandou updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-20 Thread JAllemandou
JAllemandou added a subtask: T273854: Automate regular WDQS query parsing and 
data-extraction.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries

2021-05-20 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Wikidata-Query-Service, Wikidata, Patch-For-Review, 
Discovery-Search (Current work).
Restricted Application removed a project: Patch-For-Review.

TASK DESCRIPTION
  Augment query-analysis QueryInfo with a list of 
operators+nodes+paths(+exprs?) that will be populated in order of AST-visit 
(and saved in Parquet).
  One complexity of this task is to find a common representation suitable for 
parquet for the various different items.

TASK DETAIL
  https://phabricator.wikimedia.org/T283256

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283255: Create CLI job extracting info from wdqs queries

2021-05-20 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Wikidata-Query-Service, Wikidata, Patch-For-Review, 
Discovery-Search (Current work).
Restricted Application removed a project: Patch-For-Review.

TASK DESCRIPTION
  The job should process data hourly.
  Expected parameters to be passed are `year`, `month`, `day`, `hour`, 
`input_table`, `output_table`, and an optional `num_partitions` allowing to 
tweak the number of output files (default to 1).

TASK DETAIL
  https://phabricator.wikimedia.org/T283255

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-20 Thread JAllemandou
JAllemandou closed subtask T282129: Test triple-analysis functions over a large 
dataset with Spark as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark

2021-05-20 Thread JAllemandou
JAllemandou closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T282129

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark

2021-05-20 Thread JAllemandou
JAllemandou added a comment.


  Closing this task :) Thanks fro the great work @AKhatun_WMF

TASK DETAIL
  https://phabricator.wikimedia.org/T282129

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-20 Thread JAllemandou
JAllemandou closed this task as "Resolved".
JAllemandou added a comment.


  Great ! Thanks for that :) Closing the ticket.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-20 Thread JAllemandou
JAllemandou closed subtask T282130: Provide a way to save extracted 
query-information in parquet format as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-20 Thread JAllemandou
JAllemandou added a comment.


  @AKhatun_WMF That's great! could you please provide some info on expected 
data-size in parquet (for daily data for instance)? Many thanks.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-05-06 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a way to get familiar with the data, please provide quantitative 
information over the dataset using spark in a notebook (probably using python 
as it facilitates making charts).
  The data can be found in:
  

hdfs://analytics-hadoop/wmf/data/discovery/wikidata/rdf/date=20210419/wiki=wikidata
  
  There are multiple snapshot date available, as well as multiple wikis 
(`wikidata` and `commons`). Just pick one date with `wikidata` data :)
  In hive or spark-sql:
  
use discovery;
show partitions wikibase_rdf;

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-06 Thread JAllemandou
JAllemandou added a subtask: T282130: Provide a way to save extracted 
query-information in parquet format.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-06 Thread JAllemandou
JAllemandou added a parent task: T280640: Refine WDQS queries analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format

2021-05-06 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Being able to save the information in Parquet will be very useful as it 
allows to automatically process the queries as the y flow in (hourly or daily 
for instance), facilitating regular analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T282130

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-06 Thread JAllemandou
JAllemandou added a subtask: T282129: Test triple-analysis functions over a 
large dataset with Spark.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark

2021-05-06 Thread JAllemandou
JAllemandou added a parent task: T280640: Refine WDQS queries analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T282129

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark

2021-05-06 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Once ready locally with unit-tests, apply the triple-analysis method to 
bigger data in spark (a day).

TASK DETAIL
  https://phabricator.wikimedia.org/T282129

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-05-06 Thread JAllemandou
JAllemandou added a subtask: T282127: Add unit-tests to WDQS analysis toolkit.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, 
Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit

2021-05-06 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Extract a set of queries to be used as unit-tests (10 queries) from the 
events.
  This should facilitate making sure the code is doing what we expect before 
running it on the cluster,

TASK DETAIL
  https://phabricator.wikimedia.org/T282127

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit

2021-05-06 Thread JAllemandou
JAllemandou added a parent task: T280640: Refine WDQS queries analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T282127

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281808: Wikidata all-json dumps not available from 2021-04-26

2021-05-04 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Wikidata, Dumps-Generation, Analytics.
Restricted Application added a project: wdwb-tech.

TASK DESCRIPTION
  Analytics load wikidata all-json dumps weekly on the hadoop cluster, and we 
have received an alert for dumps not being available from 2021-04-26 onward.

TASK DETAIL
  https://phabricator.wikimedia.org/T281808

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Invadibot, maantietaja, jannee_e, Akuckartz, 4748kitoko, 
Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, terrrydactyl, 
Wikidata-bugs, aude, Addshore, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis

2021-04-20 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  The current analysis parses queries and extracts:
  
  - Operators (list, and map with number of usage)
  - Nodes (variables, URIs, literals, blanck nodes) map with number of usage
  - Prefixes (map with number of usage)
  - Services (map with number of usage)
  - Wikidata names (URIs with main value matching regex `"^[QP]\\d+$"`)
  - Expressions
  - Paths
  
  The values used to identify operators, expressions, path or nodes are string, 
either the detailed name (for operators or nodes for instance), or the full 
print of the subtree portion (for path or expressions for instance).
  
  One thing we badly miss for our analysis is triple-pattern-matching 
information: when a triple-pattern is met , which form is it in ( , 
 for instance), and what are the defined value it embeds (URIs, 
literals etc). With that information we should be able to be more precise in 
term of triple-pattern usages in queries, possibly also getting a better feel 
of subgraphs heavily used.

TASK DETAIL
  https://phabricator.wikimedia.org/T280640

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, CBogen, dcausse, Gehel, tanny411, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T94019: Generate RDF from JSON

2021-04-19 Thread JAllemandou
JAllemandou added a subscriber: dcausse.
JAllemandou added a comment.


  Info: There already is in the cluster a job doing `TTL -> RDF` conversion. 
The TTL dumps are imported weekly, and converted to blazegraph RDF once 
available.
  The job is maintained by the Search Platform team (ping @dcausse ' :).

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, 
Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, 
maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-02-04 Thread JAllemandou
JAllemandou claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, JAllemandou, MPhamWMF, CBogen, Akuckartz, 4748kitoko, Nandana, 
Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction

2021-02-04 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Analytics, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  This task is about running regular query-parsing jobs for WDQS and storing 
the result in a dedicated table on HDFS.

TASK DETAIL
  https://phabricator.wikimedia.org/T273854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Aklapper, JAllemandou, MPhamWMF, CBogen, Akuckartz, 4748kitoko, Nandana, 
Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T266022: Programmatically categorize WDQS queries by potential alternative solution

2021-02-03 Thread JAllemandou
JAllemandou added a comment.


  Ah! I realize I have not updated that task. The analysis can be found here: 
https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Queries_Analysis
  @CBogen : I let you handle the definition of done, and whether this task 
should be closed or not :)

TASK DETAIL
  https://phabricator.wikimedia.org/T266022

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: GoranSMilovanovic, dcausse, JMinor, Aklapper, Gehel, Addshore, JAllemandou, 
Lydia_Pintscher, CBogen, MPhamWMF, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T266022: Programmatically categorize WDQS queries by potential alternative solution

2021-01-04 Thread JAllemandou
JAllemandou added a comment.


  Planned deadline was end of last month. I've gone through various issues 
preventing to achieve it. I'have started the actual work today (I gave it 
thought but didn't code) and wish to present results before the end of the 
month.

TASK DETAIL
  https://phabricator.wikimedia.org/T266022

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, JMinor, Aklapper, Gehel, Addshore, JAllemandou, Lydia_Pintscher, 
CBogen, MPhamWMF, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-16 Thread JAllemandou
JAllemandou added a comment.


  Some more info on this aspect: I have done a quick analysis over September 
queries today and found that my assumption that long queries were made by users 
from UI is wrong.
  
  First, total numbers of request and sum of query-time split by queries taking 
more than 1s or less:
  
+---+-+---+
|more_1s|requests |query_time |
+---+-+---+
|false  |160185762|11285161245|
|true   |2757758  |22233005459|
+---+-+---+
  
  The proportions of number of queries per time classes are the same whether a 
referer is present (expected UI) or not (expected bot).
  
+---++-+---+
|has_referer|query_time_class|count|query_time |
+---++-+---+
|false  |1_less_10ms |8613461  |43244699   |
|false  |2_10ms_to_100ms |118036102|3382186064 |
|false  |3_100ms_to_1s   |28377288 |7058252741 |
|false  |4_1s_to_10s |1815394  |6081683264 |
|false  |5_more_10s  |591957   |14313410554|
|true   |1_less_10ms |24329|133314 |
|true   |2_10ms_to_100ms |3123534  |140796917  |
|true   |3_100ms_to_1s   |2011048  |660547510  |
|true   |4_1s_to_10s |310037   |800937814  |
|true   |5_more_10s  |40370|1036973827 |
+---++-+---+
  
  Below are some information on the top-100 user-agents/referer making most 
requests with duration greatest than 1s:
  

+-+-++--++--+
|user_agent 
  |referer  

|requests_more_1s|query_time_more_1s|requests_less_1s|query_time_less_1s|

+-+-++--++--+
|ChemAxon-Marvin/20.15.0
  |null 
|209930  |1816992218|0   |0 
|
|SAP/1.0
  |null 
|143198  |248447172 |5100562 
|1834622022|
|okhttp/4.0.0-alpha02   
  |null 
|128509  |552415825 |2967
|679104|
|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 
(KHTML\, like Gecko) Chrome/50.0.2661.102 Safari/537.36   |null 
|114446  |467381034 |5899288
 |441228641 |
|commonscat_copy_from_P373 Pywikibot/3.1.dev0 (g6) requests/2.22.0 
Python/2.7.13.final.0  
|null |99342   |404639751 |5907 
   |4823011   |
|sparqlwrapper 1.8.2 (rdflib.github.io/sparqlwrapper)   
  |null 
|86830   |2843969537|289618  
|45768268  |
|Apache-HttpClient/4.5.10 (Java/1.8.0_242)  
  |null 
|70949   |1331327131|3127
|2242093   |
|bbw-bot
  |null 
|68936   |292089223 |1957742 
|234481876 |
|MyCoolTool/0.1 dlworb1...@yonsei.ac.kr 
  |null 
|52715   |149532780 |275170  
|34374373  |
|python-requests/2.24.0 
  |null 
|49917   |458021897 |364755  
|33714974  |
|Drupal

[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-06 Thread JAllemandou
JAllemandou added a comment.


  I continued my analysis today looking at top-100 parsed user-agents from both 
queries-with-referer subset, and queries-without-referer subset, over the month 
of September.
  See https://phabricator.wikimedia.org/P12933
  
  - The queries-with-referer have a defined user-agent. meaning that the 
user-agent-parser we use to extract structured information from the user-agent 
line provides values for a lot of its fields. By looking at the top-100 
user-agents we actually cover more than 90% of requests made with referer
  - The queries-without-referer have either an undefined or `Spider` 
user-agent, meaning that the user-agent line is either not parseable or is 
parsed as a bot. I inspected manually the user-agent lines and confirm that 
most of the user-agent lines looks like bots (particularly the ones making most 
requests).  By looking at the top 100 user-agents we also cover more than 90% 
of requests made without referer.
  
  This confirms that, despite being small, the requests providing a referer 
seems trustworthy. There is therefore nothing more to for this task, data is 
already available.

TASK DETAIL
  https://phabricator.wikimedia.org/T261841

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-02 Thread JAllemandou
JAllemandou added a comment.


  Heya - I'm sorry I completely missed the ping :S
  Quick analysis:
  
spark.sql("SELECT (http.request_headers['referer'] IS NOT NULL) as 
defined_referer, count(1) as c from event.wdqs_external_sparql_query where year 
= 2020 and month = 9 group by (http.request_headers['referer'] IS NOT NULL) 
limit 100").show(100, false)
+---+-+ 

|defined_referer|c|
+---+-+
|false  |165201676|
|true   |5613278  |
+---+-+
  
  --> 3.3% of requests have referer defined for September
  
  Among those 3.3%, here is the top 10:
  
spark.sql("SELECT http.request_headers['referer'] as referer, count(1) as c 
from event.wdqs_external_sparql_query where year = 2020 and month = 9 and 
http.request_headers['referer'] IS NOT NULL group by 
http.request_headers['referer'] order by c desc limit 10").show(10, false)
+-+---+ 

|referer  |c  |
+-+---+
|https://query.wikidata.org/  |2730003|
|https://labs.minutelabs.io/Tree-of-Life-Explorer/|307426 |
|https://www.wikidata.org/|212431 |
|https://labs.minutelabs.io/  |138757 |
|https://ru.wikipedia.org/|107558 |
|https://query.wikidata.org/embed.html|102165 |
|https://wlmuk.toolforge.org/ |96946  |
|https://maps.wikilovesmonuments.org/ |89894  |
|https://wikishootme.toolforge.org/   |87632  |
|https://en.wikipedia.org/|62147  |
+-+---+
  
  --> Using headers over a month, https://query.wikidata.org/ queries represent 
1.6% of queries.
  
  Having 3.3% of referer seems small. If someone with better gut-feeling of 
that could chime-in that's be great, otherwise I'm gonna try to do more 
advanced user-agent analysis on the data and try to judge if it feels organix 
or not.

TASK DETAIL
  https://phabricator.wikimedia.org/T261841

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T258269: Add query result to the current WDQS event logging

2020-09-07 Thread JAllemandou
JAllemandou added a comment.


  In term of logging-size, it probably depends on the result type: in case of 
descriptions or other text-heavy fields, this could get bigger if high or no 
`LIMIT` are set in the number of returned rows. We should set a limit :)

TASK DETAIL
  https://phabricator.wikimedia.org/T258269

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, GoranSMilovanovic, Gehel, Aklapper, CBogen, Akuckartz, 
darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261937: Add CPU load and query concurrency as context to event logging from WDQS

2020-09-07 Thread JAllemandou
JAllemandou added a comment.


  Will make it a lot easier to analyze than to have to build the 'in-flight' 
view of queries!

TASK DETAIL
  https://phabricator.wikimedia.org/T261937

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Gehel, CBogen, Akuckartz, darthmon_wmde, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I have indeed done some analysis using Apache Jena parser 
to extract algebraic representation of queries. Not yet to the level of 
completion I like though. I'll be on holidays until August 15th starting 
tonight - let's discuss when I come back?

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I finally published a wiki page with most of the results I 
found: https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Traffic_Analysis
  Sorry for the delay ...

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread JAllemandou
JAllemandou added a comment.


SELECT
http.request_headers['user-agent'],
user_agent_map,
count(1) as c
FROM event.wdqs_external_sparql_query
WHERE year = 2020 and month = 5 and day = 1
GROUP BY
http.request_headers['user-agent'],
user_agent_map
ORDER BY c DESC
LIMIT 100;

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-14 Thread JAllemandou
JAllemandou added a comment.


  > First step: analyze the frequency distribution of the user_agent field 
(string) from wmf.webrequest where queries are SPARQL.
  
  I suggest you use events instead fo webrequest:  
`event.wdqs_internal_sparql_query` and `event.wdqs_external_sparql_query`.
  
  I have done some work emcompassing user-agent frequency analysis and I 'm in 
the process of writing the findings for this end of week.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T249319: Remove wb_terms from sqoop

2020-06-02 Thread JAllemandou
JAllemandou closed this task as "Resolved".
JAllemandou updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T249319

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Milimetric, Aklapper, Addshore, 4748kitoko, Iflorez, darthmon_wmde, 
alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, JAllemandou, terrrydactyl, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-05-27 Thread JAllemandou
JAllemandou added a comment.


  An idea: How about sending back to kafka the update stream and make THAT one 
retention higher?
  Moving retention to 30 days for revision-create will make a lot of data stay 
that wouldn't be necessary (about half of the data), while keeping only the 
updates should be enough.
  Just an idea :)

TASK DETAIL
  https://phabricator.wikimedia.org/T253753

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Ottomata, dcausse, Aklapper, CBogen, 4748kitoko, 
darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-03-13 Thread JAllemandou
JAllemandou added a comment.


  Patch needs to be deployed before the dashboard shows data.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Milimetric, Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, 
Lydia_Pintscher, Alter-paule, Hazizibinmahdi, Beast1978, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, E.S.A-Sheild, Iflorez, darthmon_wmde, 
alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs

2020-02-27 Thread JAllemandou
JAllemandou added a comment.


  Events using `isBlank` since the beginning of year are now stored here: 
`/user/joal/wdqs_queries/2020_use_isBlank/wdqs_use_is_blank_202002.json`.
  There are ~56k events stored  in json format in a single file to facilitate 
analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T246237

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lea_Lacroix_WMDE, JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs

2020-02-26 Thread JAllemandou
JAllemandou added a comment.


  As I was working on getting a better idea of the queries, I got some results 
relatively easily:
  Since beginning of year:
  
  - Internal cluster: No request using `isBlank()`, 481202298 requests total
  - External cluster: 54669 requests using `isBlank()`, 202695416 requests 
total (0.03%)
  
  I can provide more details as needed :)

TASK DETAIL
  https://phabricator.wikimedia.org/T246237

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Retitled] T209655: Copy Wikidata dumps to HDFS + parquet

2020-02-18 Thread JAllemandou
JAllemandou renamed this task from "Copy Wikidata dumps to HDFS" to "Copy 
Wikidata dumps to HDFS + parquet".

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Beast1978, Un1tY, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T209655: Copy Wikidata dumps to HDFS

2020-01-28 Thread JAllemandou
JAllemandou added a subtask: T243832: Fix hdfs-rsync`prune-empty-dirs` feature.

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
AramBakir, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T209655: Copy Wikidata dumps to HDFS

2020-01-28 Thread JAllemandou
JAllemandou claimed this task.
JAllemandou added a project: Analytics-Kanban.
JAllemandou set the point value for this task to "5".

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
AramBakir, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-01-08 Thread JAllemandou
JAllemandou added a project: Analytics-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
Iflorez, darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-01-08 Thread JAllemandou
JAllemandou added a comment.


  The patch merged by @Nuria had a bug. I commented on the already merged patch 
on a solution. For the moment the job is not started.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
Iflorez, darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T239898: Investigate triple counts difference between dumps and what blazegraph reports

2019-12-09 Thread JAllemandou
JAllemandou added a comment.


  Chiming in: I suggest using Spark for investigations - Given the size of the 
dataset, parallel computation should help. This means another hop for the data: 
--> stat1004 --> HDFS. Please ping if you want/need help :)

TASK DETAIL
  https://phabricator.wikimedia.org/T239898

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Gehel, elukey, dcausse, Aklapper, darthmon_wmde, DannyS712, 
Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T209655: Copy Wikidata dumps to HDFS

2019-12-04 Thread JAllemandou
JAllemandou added a subscriber: Groceryheist.
JAllemandou added a comment.


  New dataset available @GoranSMilovanovic. Pinging @Groceryheist  as I also 
generated the items per page.
  
hdfs dfs -ls /user/joal/wmf/data/wmf/mediawiki/wikidata_parquet | tail -1
drwxr-xr-x   - analytics joal  0 2019-12-04 18:31 
/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20191202

hdfs dfs -ls /user/joal/wmf/data/wmf/wikidata/item_page_link/ | tail -1
drwxr-xr-x   - joal joal  0 2019-12-04 18:50 
/user/joal/wmf/data/wmf/wikidata/item_page_link/20191202

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, 4748kitoko, darthmon_wmde, 
DannyS712, Nandana, Akovalyov, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, Capt_Swing, Mbch331, 
jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239471: Sqoop wikidata terms tables into hadoop

2019-11-29 Thread JAllemandou
JAllemandou added a project: Analytics-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T239471

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, 4748kitoko, Hook696, Daryl-TTMG, 
RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez, darthmon_wmde, alaa_wmde, 
Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-11-27 Thread JAllemandou
JAllemandou added a comment.


  Does this being closed mean we can access data on kafka?

TASK DETAIL
  https://phabricator.wikimedia.org/T101013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Igorkim78, JAllemandou, Ottomata, Smalyshev, Deskana, Aklapper, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
holger.knust, Meekrab2012, joker88john, ET4Eva, DannyS712, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Avner, Lewizho99, 
Maathavan, Gehel, _jensen, rosalieper, Scott_WUaS, Jonas, FloNight, Xmlizer, 
mobrovac, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
GWicke, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T236895: ArticlePlaceholder dashboard stopped tracking page views

2019-10-30 Thread JAllemandou
JAllemandou added a comment.


  I think this problem could be related to T226730 (preventing most 
`Special:XXX` pages to be flagged as pageviews).

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 4748kitoko, 
darthmon_wmde, DannyS712, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, 
QZanden, cmadeo, LawExplorer, _jensen, rosalieper, Jonas, terrrydactyl, 
Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-10-03 Thread JAllemandou
JAllemandou added a comment.


  this is done @GoranSMilovanovic.
  Raw data is here 
`/user/joal/wmf/data/raw/mediawiki/wikidata/all_jsondumps/20190902` and parquet 
data is here `/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20190902`

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: WMDE-leszek, abian, leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, 
JAllemandou, bmansurov, 4748kitoko, darthmon_wmde, DannyS712, Nandana, 
Akovalyov, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, terrrydactyl, 
Wikidata-bugs, aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-06-08 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic : You're welcome :) At some point I'll manage to have that 
productionize ;)

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: abian, leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, JAllemandou, 
bmansurov, darthmon_wmde, Premeditated, Nandana, Akovalyov, Lahi, Gq86, 
QZanden, LawExplorer, Avner, _jensen, rosalieper, terrrydactyl, Wikidata-bugs, 
aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-16 Thread JAllemandou
JAllemandou added a comment.


  A lot trickier :)
  We have the `wmf_raw.mediawiki_private_cu_changes` table in hive, allowing us 
to compute geo-editors (editors-by-country, aggregated). This table only 
contains 3 month of data for PII removal reasons. It's probably not enough for 
what you're after, but I have nothing better (see 
https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/geoeditors/monthly/insert_geoeditors_monthly_data.hql
 for an example).
  I've just created T223444 <https://phabricator.wikimedia.org/T223444> to 
submit the general idea of having geo-editors stats split by desktop/mobile.

TASK DETAIL
  https://phabricator.wikimedia.org/T220977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: JAllemandou, Milimetric, RazShuty, Lea_WMDE, Aklapper, darthmon_wmde, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-14 Thread JAllemandou
JAllemandou added a comment.


  Hi @Lea_WMDE and @GoranSMilovanovic - I think the answer the your problem is 
solved in this month snapshot with the `revision_tags` field of 
mediawiki_history:
  
spark.sql("""
SELECT
substr(event_timestamp, 0, 4) as year,
array_contains(revision_tags, 'mobile edit') as mobile,
array_contains(revision_tags, 'mobile app edit')  as mobile_app,
count(1) as c
FROM wmf.mediawiki_history
WHERE snapshot = '2019-04'
AND wiki_db = 'wikidatawiki'
AND event_entity = 'revision'
GROUP BY
substr(event_timestamp, 0, 4),
array_contains(revision_tags, 'mobile edit'),
array_contains(revision_tags, 'mobile app edit')
ORDER BY year, mobile, mobile_app desc
""").show(100, false)

++--+--+-+  

|year|mobile|mobile_app|c|
++--+--+-+
|2004|null  |null  |146  |
|2005|null  |null  |495  |
|2006|null  |null  |1838 |
|2007|null  |null  |2814 |
|2008|null  |null  |2384 |
|2009|null  |null  |2175 |
|2010|null  |null  |1650 |
|2011|null  |null  |1354 |
|2012|null  |null  |2912961  |
|2012|false |false |4|
|2013|null  |null  |94142292 |
|2013|false |false |181133   |
|2014|null  |null  |69236941 |
|2014|false |true  |2|
|2014|false |false |18174243 |
|2014|true  |false |51   |
|2015|null  |null  |76088107 |
|2015|false |true  |586  |
|2015|false |false |26269493 |
|2015|true  |false |4058 |
|2016|null  |null  |82178134 |
|2016|false |false |53308675 |
|2016|true  |true  |618  |
|2016|true  |false |24248|
|2017|null  |null  |109041593|
|2017|false |false |83147234 |
|2017|true  |true  |114906   |
|2017|true  |false |49836|
|2018|null  |null  |141536855|
|2018|false |false |67149958 |
|2018|true  |true  |186065   |
|2018|true  |false |71822|
|2019|null  |null  |55814156 |
|2019|false |false |49994060 |
|2019|true  |true  |85968|
|2019|true  |false |23867|
++--+--+-+

TASK DETAIL
  https://phabricator.wikimedia.org/T220977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: JAllemandou, Milimetric, RazShuty, Lea_WMDE, Aklapper, darthmon_wmde, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T94019: Generate RDF from JSON

2019-04-23 Thread JAllemandou
JAllemandou added a comment.


  The analytics hadoop cluster could also be of use here: the task can easily 
take advantage of parallelization.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Pintoch, Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, 
daniel, alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-04-23 Thread JAllemandou
JAllemandou added a comment.


  Community has spoken, we'll find workarounds - Thanks a lot @ArielGlenn for 
helping driving this :)

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Pintoch, Rosiestep, Lea_Lacroix_WMDE, WMDE-leszek, Mvolz, 
notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, 
ArielGlenn, JAllemandou, alaa_wmde, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Zambujo, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, 
Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, 
QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, 
gnosygnu, Wikidata-bugs, aude, Daniel_Mietchen, jayvdb, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T218901: Track number of Wikidata edits by namespace

2019-04-08 Thread JAllemandou
JAllemandou added a comment.


  Some queries are computed using hadoop for wikidata (see 
https://github.com/wikimedia/analytics-refinery/tree/master/oozie/wikidata). If 
SQL over recent-changes works for, that's great :)

TASK DETAIL
  https://phabricator.wikimedia.org/T218901

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, Lucas_Werkmeister_WMDE, pdehaye, 
alaa_wmde, joker88john, Michael, CucyNoiD, Nandana, NebulousIris, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, 
Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, 
Salgo60, Lewizho99, Maathavan, _jensen, rosalieper, abian, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T218901: Track number of Wikidata edits by namespace

2019-04-04 Thread JAllemandou
JAllemandou added a comment.


  Reading about this - Would delayed data be interesting? This information is 
accessible in hadoop :)

TASK DETAIL
  https://phabricator.wikimedia.org/T218901

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, Lucas_Werkmeister_WMDE, pdehaye, 
alaa_wmde, Michael, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, 
Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, Salgo60, 
Lewizho99, Maathavan, _jensen, rosalieper, abian, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T209655: Copy Wikidata dumps to HDFs

2019-03-26 Thread JAllemandou
JAllemandou added a comment.


  Most of the complicated things already exist for this to work (equicalent of 
rsync for HDFS, spark job converting wikidata json dumps to parquet).
  I wanted for T216160 <https://phabricator.wikimedia.org/T216160> to be 
settled before moving into productionization (having the same date for the 
various dumps we handle simplifies quite a bit), and it takes time.

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, JAllemandou, 
bmansurov, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Wikidata-bugs, aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214897: data for analyzing and visualizing the identifier landscape of Wikidata

2019-03-15 Thread JAllemandou
JAllemandou added a comment.


  Hey @GoranSMilovanovic  - I don't have a good understanding of what you're 
after, but having read pairs and contingency table above, maybe this Spark 
function could be helpful: 
https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/DataFrameStatFunctions.html

TASK DETAIL
  https://phabricator.wikimedia.org/T214897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic, 
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-03-14 Thread JAllemandou
JAllemandou added a comment.


  In T216160#5020236 <https://phabricator.wikimedia.org/T216160#5020236>, 
@ArielGlenn wrote:
  
  > By Friday I'll have done that; by next Wednesday let's make a decision, 
barring any huge obstacles.
  
  
  Awesome, thanks @ArielGlenn  :)

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, 
ArielGlenn, JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu, 
Wikidata-bugs, aude, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-03-11 Thread JAllemandou
JAllemandou added a comment.


  Following up on this: another viable solution to get monthly-coherence 
between dumps is to force a dump on the 1st of the month ... I'm not sure the 
idea is better.
  @ArielGlenn  - How do we proceed to try moving forward (in either direction) ?

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, ArielGlenn, 
JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, 
Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu, Wikidata-bugs, 
aude, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T217821: Investigate duplication of strings in wb_terms table for wikidatawiki

2019-03-07 Thread JAllemandou
JAllemandou added a comment.


  Exact analysis ran on 2018-12-06:
  
val df = 
spark.read.parquet("/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20181001")
val base_rdd = df.select("labels", "descriptions", "aliases").rdd
val strings = base_rdd.flatMap(r => {
  r.getMap[String,String](0).values ++
  r.getMap[String,String](1).values ++
  r.getMap[String,Seq[String]](2).values.flatMap(l => l)
})

val grouped_strings = strings.map(s => (s, 1)).reduceByKey(_+_)


val total_bytes = grouped_strings.map(t => t._1.getBytes.length * 
t._2).sum()
val duplicate_bytes = grouped_strings.map(t => t._1.getBytes.length * (t._2 
- 1)).sum()

println(f"Total bytes for strings: $total_bytes%15.0f")
println(f"Total duplicate bytes for strings: $duplicate_bytes%15.0f")
println(f"Usefull bytes for strings: ${total_bytes - 
duplicate_bytes}%15.0f")

//Total bytes for strings: 45,724,033,674
//Total duplicate bytes for strings: 41,630,588,801
//Usefull bytes for strings: 4,093,444,873
// Usefull is 1 order of magnitude less than used

// Triple check usefull bytes for strings:
grouped_strings.map(_._1.getBytes.length).sum() == (total_bytes - 
duplicate_bytes)
// true


// How many unique strings?
grouped_strings.count()
// 98,524,732

// How many string with 1 instance?
grouped_strings.filter(t => t._2 == 1).count()
// 72,584,179
// Leaving 25,940,553 unique strings having multiple instances

// --> If we go for table-indirection, we'll need ~100M longs (4 bytes)
// --> 400,000,000 bytes  - 1 order of magnitude less than unique string 
size

TASK DETAIL
  https://phabricator.wikimedia.org/T217821

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Addshore, alaa_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, 
aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-26 Thread JAllemandou
JAllemandou added a comment.


  Hi @Isaac 
  Sorry for the issue. I correcte the query above (last query, join criteria: 
`AND ws.sitelink.title = title_namespace_localized` --> `AND 
REPLACE(ws.sitelink.title, ' ', '_') = title_namespace_localized`
  We were not joining correctly on title as mediawikik-history encodes them 
with underscores while wikidata dump uses spaces.
  Problem solves, data regenerated at the same place as before, double check on 
enwiki numbers look good: 5.96M pages have an item in namespace 0 (7.95M for 
all namespaces).

TASK DETAIL
  https://phabricator.wikimedia.org/T215616

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Marostegui, Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, 
JAllemandou, diego, Nandana, Akovalyov, Banyek, Rayssa-, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, Wikidata-bugs, aude, 
Capt_Swing, Dinoguy1000, Mbch331, Jay8g, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-21 Thread JAllemandou
JAllemandou added a comment.


  We're on the same page @diego  :)
  I can precompute the table described in ii) if needed, and will surely do it 
once we'll have the wikidata-dump productioned - Let me know if you need it 
before

TASK DETAIL
  https://phabricator.wikimedia.org/T215616

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, 
Nandana, Akovalyov, Banyek, AndyTan, Rayssa-, Lahi, Gq86, GoranSMilovanovic, 
QZanden, Marostegui, LawExplorer, Avner, Minhnv-2809, _jensen, Luke081515, 
Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, Krenair, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-20 Thread JAllemandou
JAllemandou added a comment.
I can't speak about failures and restarts as I don't know much about the dumps-generation process. @ArielGlenn would the person to know best.
As for the dates, the main reason we ask for the change is for dates consistency by month, mimic-ing  what exists for xml dumps.TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


  1   2   >