[Wikidata-bugs] [Maniphest] [Updated] T209655: Copy Wikidata dumps to HDFs

2019-03-26 Thread JAllemandou
JAllemandou added a comment.


  Most of the complicated things already exist for this to work (equicalent of 
rsync for HDFS, spark job converting wikidata json dumps to parquet).
  I wanted for T216160 <https://phabricator.wikimedia.org/T216160> to be 
settled before moving into productionization (having the same date for the 
various dumps we handle simplifies quite a bit), and it takes time.

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, JAllemandou, 
bmansurov, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Wikidata-bugs, aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T218901: Track number of Wikidata edits by namespace

2019-04-04 Thread JAllemandou
JAllemandou added a comment.


  Reading about this - Would delayed data be interesting? This information is 
accessible in hadoop :)

TASK DETAIL
  https://phabricator.wikimedia.org/T218901

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, Lucas_Werkmeister_WMDE, pdehaye, 
alaa_wmde, Michael, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, 
Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, Salgo60, 
Lewizho99, Maathavan, _jensen, rosalieper, abian, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T218901: Track number of Wikidata edits by namespace

2019-04-08 Thread JAllemandou
JAllemandou added a comment.


  Some queries are computed using hadoop for wikidata (see 
https://github.com/wikimedia/analytics-refinery/tree/master/oozie/wikidata). If 
SQL over recent-changes works for, that's great :)

TASK DETAIL
  https://phabricator.wikimedia.org/T218901

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, Lucas_Werkmeister_WMDE, pdehaye, 
alaa_wmde, joker88john, Michael, CucyNoiD, Nandana, NebulousIris, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, 
Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, 
Salgo60, Lewizho99, Maathavan, _jensen, rosalieper, abian, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-04-23 Thread JAllemandou
JAllemandou added a comment.


  Community has spoken, we'll find workarounds - Thanks a lot @ArielGlenn for 
helping driving this :)

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Pintoch, Rosiestep, Lea_Lacroix_WMDE, WMDE-leszek, Mvolz, 
notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, 
ArielGlenn, JAllemandou, alaa_wmde, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Zambujo, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, 
Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, 
QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, 
gnosygnu, Wikidata-bugs, aude, Daniel_Mietchen, jayvdb, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T94019: Generate RDF from JSON

2019-04-23 Thread JAllemandou
JAllemandou added a comment.


  The analytics hadoop cluster could also be of use here: the task can easily 
take advantage of parallelization.

TASK DETAIL
  https://phabricator.wikimedia.org/T94019

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Pintoch, Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, 
daniel, alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-14 Thread JAllemandou
JAllemandou added a comment.


  Hi @Lea_WMDE and @GoranSMilovanovic - I think the answer the your problem is 
solved in this month snapshot with the `revision_tags` field of 
mediawiki_history:
  
spark.sql("""
SELECT
substr(event_timestamp, 0, 4) as year,
array_contains(revision_tags, 'mobile edit') as mobile,
array_contains(revision_tags, 'mobile app edit')  as mobile_app,
count(1) as c
FROM wmf.mediawiki_history
WHERE snapshot = '2019-04'
AND wiki_db = 'wikidatawiki'
AND event_entity = 'revision'
GROUP BY
substr(event_timestamp, 0, 4),
array_contains(revision_tags, 'mobile edit'),
array_contains(revision_tags, 'mobile app edit')
ORDER BY year, mobile, mobile_app desc
""").show(100, false)

++--+--+-+  

|year|mobile|mobile_app|c|
++--+--+-+
|2004|null  |null  |146  |
|2005|null  |null  |495  |
|2006|null  |null  |1838 |
|2007|null  |null  |2814 |
|2008|null  |null  |2384 |
|2009|null  |null  |2175 |
|2010|null  |null  |1650 |
|2011|null  |null  |1354 |
|2012|null  |null  |2912961  |
|2012|false |false |4|
|2013|null  |null  |94142292 |
|2013|false |false |181133   |
|2014|null  |null  |69236941 |
|2014|false |true  |2|
|2014|false |false |18174243 |
|2014|true  |false |51   |
|2015|null  |null  |76088107 |
|2015|false |true  |586  |
|2015|false |false |26269493 |
|2015|true  |false |4058 |
|2016|null  |null  |82178134 |
|2016|false |false |53308675 |
|2016|true  |true  |618  |
|2016|true  |false |24248|
|2017|null  |null  |109041593|
|2017|false |false |83147234 |
|2017|true  |true  |114906   |
|2017|true  |false |49836|
|2018|null  |null  |141536855|
|2018|false |false |67149958 |
|2018|true  |true  |186065   |
|2018|true  |false |71822|
|2019|null  |null  |55814156 |
|2019|false |false |49994060 |
|2019|true  |true  |85968|
|2019|true  |false |23867|
++--+--+-+

TASK DETAIL
  https://phabricator.wikimedia.org/T220977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: JAllemandou, Milimetric, RazShuty, Lea_WMDE, Aklapper, darthmon_wmde, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T220977: Investigate surprising rise in mobile page views for wikidata

2019-05-16 Thread JAllemandou
JAllemandou added a comment.


  A lot trickier :)
  We have the `wmf_raw.mediawiki_private_cu_changes` table in hive, allowing us 
to compute geo-editors (editors-by-country, aggregated). This table only 
contains 3 month of data for PII removal reasons. It's probably not enough for 
what you're after, but I have nothing better (see 
https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/geoeditors/monthly/insert_geoeditors_monthly_data.hql
 for an example).
  I've just created T223444 <https://phabricator.wikimedia.org/T223444> to 
submit the general idea of having geo-editors stats split by desktop/mobile.

TASK DETAIL
  https://phabricator.wikimedia.org/T220977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: JAllemandou, Milimetric, RazShuty, Lea_WMDE, Aklapper, darthmon_wmde, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-06-08 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic : You're welcome :) At some point I'll manage to have that 
productionize ;)

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: abian, leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, JAllemandou, 
bmansurov, darthmon_wmde, Premeditated, Nandana, Akovalyov, Lahi, Gq86, 
QZanden, LawExplorer, Avner, _jensen, rosalieper, terrrydactyl, Wikidata-bugs, 
aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T239898: Investigate triple counts difference between dumps and what blazegraph reports

2019-12-09 Thread JAllemandou
JAllemandou added a comment.


  Chiming in: I suggest using Spark for investigations - Given the size of the 
dataset, parallel computation should help. This means another hop for the data: 
--> stat1004 --> HDFS. Please ping if you want/need help :)

TASK DETAIL
  https://phabricator.wikimedia.org/T239898

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Gehel, elukey, dcausse, Aklapper, darthmon_wmde, DannyS712, 
Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-01-08 Thread JAllemandou
JAllemandou added a comment.


  The patch merged by @Nuria had a bug. I commented on the already merged patch 
on a solution. For the moment the job is not started.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
Iflorez, darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-01-08 Thread JAllemandou
JAllemandou added a project: Analytics-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
Iflorez, darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T209655: Copy Wikidata dumps to HDFS

2020-01-28 Thread JAllemandou
JAllemandou claimed this task.
JAllemandou added a project: Analytics-Kanban.
JAllemandou set the point value for this task to "5".

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
AramBakir, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T209655: Copy Wikidata dumps to HDFS

2020-01-28 Thread JAllemandou
JAllemandou added a subtask: T243832: Fix hdfs-rsync`prune-empty-dirs` feature.

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
AramBakir, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Retitled] T209655: Copy Wikidata dumps to HDFS + parquet

2020-02-18 Thread JAllemandou
JAllemandou renamed this task from "Copy Wikidata dumps to HDFS" to "Copy 
Wikidata dumps to HDFS + parquet".

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, Beast1978, Un1tY, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Adik2382, 
Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, 
Maathavan, _jensen, rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, 
Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs

2020-02-26 Thread JAllemandou
JAllemandou added a comment.


  As I was working on getting a better idea of the queries, I got some results 
relatively easily:
  Since beginning of year:
  
  - Internal cluster: No request using `isBlank()`, 481202298 requests total
  - External cluster: 54669 requests using `isBlank()`, 202695416 requests 
total (0.03%)
  
  I can provide more details as needed :)

TASK DETAIL
  https://phabricator.wikimedia.org/T246237

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs

2020-02-27 Thread JAllemandou
JAllemandou added a comment.


  Events using `isBlank` since the beginning of year are now stored here: 
`/user/joal/wdqs_queries/2020_use_isBlank/wdqs_use_is_blank_202002.json`.
  There are ~56k events stored  in json format in a single file to facilitate 
analysis.

TASK DETAIL
  https://phabricator.wikimedia.org/T246237

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lea_Lacroix_WMDE, JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-03-13 Thread JAllemandou
JAllemandou added a comment.


  Patch needs to be deployed before the dashboard shows data.

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, JAllemandou
Cc: Milimetric, Ladsgroup, Nuria, JAllemandou, elukey, Addshore, Aklapper, 
Lydia_Pintscher, Alter-paule, Hazizibinmahdi, Beast1978, Un1tY, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, E.S.A-Sheild, Iflorez, darthmon_wmde, 
alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, cmadeo, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T208569: Get Wikidata clickstream

2018-11-23 Thread JAllemandou
JAllemandou added a comment.
Hi @GoranSMilovanovic, the code we use to generate monthly data is here: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/ClickstreamBuilder.scala
As per the clickstream database in Hive, it's not used anymore, it's a left-over from Ellerys time.TASK DETAILhttps://phabricator.wikimedia.org/T208569EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovic, JAllemandouCc: JAllemandou, Aklapper, Lea_WMDE, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, D3r1ck01, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2018-11-30 Thread JAllemandou
JAllemandou added a comment.
Thanks for raising the issue. This is very bizarre.
The job for october was showing successful in our side. I reran it, and data showed up :(
I have the feeling this is not the first time this happens, something must be wrong somewhere.
I'am also going to run backfilling info.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2018-11-30 Thread JAllemandou
JAllemandou added a project: Analytics.
TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2018-11-30 Thread JAllemandou
JAllemandou added a comment.
Info backfilled since beggining of time: https://grafana.wikimedia.org/dashboard/db/wikidata-co-editors?orgId=1&from=now-8y&to=now
Will keep an eye on next month run.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2018-12-20 Thread JAllemandou
JAllemandou added a comment.
Same exact problem as last month: job has run, but no data is present :(
More investigations needed, probably early next year.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2019-01-07 Thread JAllemandou
JAllemandou added a comment.
Hi @WMDE-leszek - core data has not been computed et (usually done around the 9th of the following month).
I'll be sure to have an eye on data showing up for month 12 and rerun the job if needed.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2019-01-07 Thread JAllemandou
JAllemandou added a comment.
Bug found and corrected (patches above).
Data is available now and the rerun problem should be solved.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214897: data for analyzing and visualizing the identifier landscape of Wikidata

2019-02-04 Thread JAllemandou
JAllemandou added a comment.
Hi folks - Sorry for late answer, I was at WMF all-hands last week and did not check tasks.
I have started work work on having the wikidata-json dumps imported on the cluster, and while some data is available for ad-hoc analysis (see hdfs:///user/joal/wmf/data/wmf/mediawiki/wikidata_parquet), this dataset is not updated on a regular basis (not production-ready).
I however think that for a manual update every 3 month, it could be easy.
@GoranSMilovanovic - What do you think?TASK DETAILhttps://phabricator.wikimedia.org/T214897EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GoranSMilovanovic, JAllemandouCc: Addshore, JAllemandou, Aklapper, GoranSMilovanovic, Lydia_Pintscher, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, Jonas, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2019-02-07 Thread JAllemandou
JAllemandou added a comment.
I confirm the fix :) Closing this task.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: WMDE-leszek, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread JAllemandou
JAllemandou added a comment.
@diego :
This has worked for me (takes some time to compute and needs a bunch of resources). I hope it's close enough to what you want :) :

spark.sql("SET spark.sql.shuffle.partitions=512")
val wikidataParquetPath = "/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20181001"
spark.read.parquet(wikidataParquetPath).createOrReplaceTempView("wikidata")

val df = spark.sql("""

WITH namespaced_revisions AS (
  SELECT
wiki_db,
revision_id,
event_timestamp,
page_title,
page_namespace,
CASE WHEN (LENGTH(namespace_localized_name) > 0)
  THEN CONCAT(namespace_localized_name, ':', page_title)
  ELSE page_title
END AS title_namespace_localized
  FROM wmf.mediawiki_history mwh
INNER JOIN wmf_raw.mediawiki_project_namespace_map nsm
  ON (
mwh.wiki_db = nsm.dbname
AND mwh.page_namespace = nsm.namespace
AND mwh.snapshot = nsm.snapshot
  )
  WHERE mwh.snapshot = '2019-01'
AND nsm.snapshot = '2019-01'
AND event_entity = 'revision'
AND NOT revision_is_deleted
),

wikidata_sitelinks AS (
  SELECT
id as item_id,
EXPLODE(siteLinks) AS sitelink
  FROM wikidata
  WHERE size(siteLinks) > 0
)

SELECT
  item_id,
  wiki_db,
  revision_id,
  event_timestamp,
  page_title,
  page_namespace
FROM wikidata_sitelinks ws
  INNER JOIN namespaced_revisions nsr
ON (
  ws.sitelink.site = nsr.wiki_db
  AND ws.sitelink.title = title_namespace_localized
)
""")TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Nuria, JAllemandou, diego, Nandana, Akovalyov, Banyek, AndyTan, Rayssa-, Lahi, Gq86, GoranSMilovanovic, QZanden, Marostegui, LawExplorer, Avner, Minhnv-2809, _jensen, Luke081515, Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, Krenair, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-14 Thread JAllemandou
JAllemandou created this task.JAllemandou added projects: Wikidata, Dumps-Generation.
TASK DESCRIPTIONCurrently wikidata-entities dumps are generated on a fixed weekday basis (monday every two weeks for instance). It would be easier for some use-cases to get a fixed day-of-month basis (1st day of month and 15th day of month).TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Addshore, ArielGlenn, JAllemandou, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-14 Thread JAllemandou
JAllemandou added a project: Analytics.
TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-15 Thread JAllemandou
JAllemandou added a comment.
@ArielGlenn : Could we decide on regular day-in-month patterns for the various entity-dumps that need to be generated?
Here is my suggestion::


EntitiesFormatsCurrent Frequency New suggested frequency
alljson / nt / ttlEvery monday1st, 8th, 15th,  22nd of every month
truthyntEvery wednesday3rd, 10th, 17th,  24th of every month
lexemesnt / ttlEvery friday5th, 12th, 19th,  26th of every month



We'd be a bit late at end of month (worst case +3 days of waiting in comparison to regular releases). Is that an issue?
Thanks :)TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-15 Thread JAllemandou
JAllemandou added a comment.
Works for me :) I assume the system would work similarly to the existing XML dumps, meaning that dumps would be generated in the same date folder (1st, 8th, 15th, 22nd of every month for instance), one after the other, providing information on availability in a json file?TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-16 Thread JAllemandou
JAllemandou added a comment.
Many thanks @ArielGlenn :)TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-18 Thread JAllemandou
JAllemandou added a comment.
Hi @Isaac, I have generated some parquet data here /user/joal/wmf/data/wmf/wikidata/item_page_link/20190204  with the following query:

spark.sql("SET spark.sql.shuffle.partitions=128")
val wikidataParquetPath = "/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20190204"
spark.read.parquet(wikidataParquetPath).createOrReplaceTempView("wikidata")

spark.sql("""

WITH namespaced_revisions AS (
  SELECT
wiki_db,
page_id,
page_title,
page_namespace,
CASE WHEN (LENGTH(namespace_localized_name) > 0)
  THEN CONCAT(namespace_localized_name, ':', page_title)
  ELSE page_title
END AS title_namespace_localized
  FROM (
SELECT
  wiki_db,
  page_id,
  page_title,
  page_namespace,
  row_number() OVER (PARTITION BY wiki_db, page_id ORDER BY start_timestamp DESC) as row_num
FROM wmf.mediawiki_page_history
WHERE snapshot = '2019-01'
  AND page_id IS NOT NULL AND page_id > 0
  AND page_title IS NOT NULL and LENGTH(page_title) > 0
  ) ph
INNER JOIN wmf_raw.mediawiki_project_namespace_map nsm
  ON (
ph.wiki_db = nsm.dbname
AND ph.page_namespace = nsm.namespace
AND nsm.snapshot = '2019-01'
  )
  WHERE row_num = 1
),

wikidata_sitelinks AS (
  SELECT
id as item_id,
EXPLODE(siteLinks) AS sitelink
  FROM wikidata
  WHERE size(siteLinks) > 0
)

SELECT
  item_id,
  wiki_db,
  page_id,
  page_title,
  page_namespace,
  title_namespace_localized
FROM wikidata_sitelinks ws
  INNER JOIN namespaced_revisions nsr
ON (
  ws.sitelink.site = nsr.wiki_db
  AND ws.sitelink.title = title_namespace_localized
)
""").repartition(16).write.parquet("/user/joal/wmf/data/wmf/wikidata/item_page_link/20190204")

@diego : I can generate similar data for by-revision, but before doing wanted to be sure we agree on the fact that it'll flag every historical revision associated to an item currently linked to a page. Is that what you're after ? Or more of the history of linkagebetween page and item ?
Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, Nandana, Akovalyov, Banyek, AndyTan, Rayssa-, Lahi, Gq86, GoranSMilovanovic, QZanden, Marostegui, LawExplorer, Avner, Minhnv-2809, _jensen, Luke081515, Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, Krenair, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-19 Thread JAllemandou
JAllemandou added a comment.
Thanks @Isaac for reformulating the question I tried to explain above :)
@diego: Can you confirm there is value for you in having revisions tied to wikidata-items regardless of when the link happened?TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, Nandana, Akovalyov, Banyek, AndyTan, Rayssa-, Lahi, Gq86, GoranSMilovanovic, QZanden, Marostegui, LawExplorer, Avner, Minhnv-2809, _jensen, Luke081515, Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, Krenair, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-02-20 Thread JAllemandou
JAllemandou added a comment.
I can't speak about failures and restarts as I don't know much about the dumps-generation process. @ArielGlenn would the person to know best.
As for the dates, the main reason we ask for the change is for dates consistency by month, mimic-ing  what exists for xml dumps.TASK DETAILhttps://phabricator.wikimedia.org/T216160EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, ArielGlenn, JAllemandou, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-21 Thread JAllemandou
JAllemandou added a comment.


  We're on the same page @diego  :)
  I can precompute the table described in ii) if needed, and will surely do it 
once we'll have the wikidata-dump productioned - Let me know if you need it 
before

TASK DETAIL
  https://phabricator.wikimedia.org/T215616

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, 
Nandana, Akovalyov, Banyek, AndyTan, Rayssa-, Lahi, Gq86, GoranSMilovanovic, 
QZanden, Marostegui, LawExplorer, Avner, Minhnv-2809, _jensen, Luke081515, 
Wikidata-bugs, aude, Capt_Swing, Dinoguy1000, Mbch331, Jay8g, Krenair, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-26 Thread JAllemandou
JAllemandou added a comment.


  Hi @Isaac 
  Sorry for the issue. I correcte the query above (last query, join criteria: 
`AND ws.sitelink.title = title_namespace_localized` --> `AND 
REPLACE(ws.sitelink.title, ' ', '_') = title_namespace_localized`
  We were not joining correctly on title as mediawikik-history encodes them 
with underscores while wikidata dump uses spaces.
  Problem solves, data regenerated at the same place as before, double check on 
enwiki numbers look good: 5.96M pages have an item in namespace 0 (7.95M for 
all namespaces).

TASK DETAIL
  https://phabricator.wikimedia.org/T215616

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Marostegui, Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, 
JAllemandou, diego, Nandana, Akovalyov, Banyek, Rayssa-, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, Wikidata-bugs, aude, 
Capt_Swing, Dinoguy1000, Mbch331, Jay8g, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T217821: Investigate duplication of strings in wb_terms table for wikidatawiki

2019-03-07 Thread JAllemandou
JAllemandou added a comment.


  Exact analysis ran on 2018-12-06:
  
val df = 
spark.read.parquet("/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20181001")
val base_rdd = df.select("labels", "descriptions", "aliases").rdd
val strings = base_rdd.flatMap(r => {
  r.getMap[String,String](0).values ++
  r.getMap[String,String](1).values ++
  r.getMap[String,Seq[String]](2).values.flatMap(l => l)
})

val grouped_strings = strings.map(s => (s, 1)).reduceByKey(_+_)


val total_bytes = grouped_strings.map(t => t._1.getBytes.length * 
t._2).sum()
val duplicate_bytes = grouped_strings.map(t => t._1.getBytes.length * (t._2 
- 1)).sum()

println(f"Total bytes for strings: $total_bytes%15.0f")
println(f"Total duplicate bytes for strings: $duplicate_bytes%15.0f")
println(f"Usefull bytes for strings: ${total_bytes - 
duplicate_bytes}%15.0f")

//Total bytes for strings: 45,724,033,674
//Total duplicate bytes for strings: 41,630,588,801
//Usefull bytes for strings: 4,093,444,873
// Usefull is 1 order of magnitude less than used

// Triple check usefull bytes for strings:
grouped_strings.map(_._1.getBytes.length).sum() == (total_bytes - 
duplicate_bytes)
// true


// How many unique strings?
grouped_strings.count()
// 98,524,732

// How many string with 1 instance?
grouped_strings.filter(t => t._2 == 1).count()
// 72,584,179
// Leaving 25,940,553 unique strings having multiple instances

// --> If we go for table-indirection, we'll need ~100M longs (4 bytes)
// --> 400,000,000 bytes  - 1 order of magnitude less than unique string 
size

TASK DETAIL
  https://phabricator.wikimedia.org/T217821

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Addshore, alaa_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, 
aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-03-11 Thread JAllemandou
JAllemandou added a comment.


  Following up on this: another viable solution to get monthly-coherence 
between dumps is to force a dump on the 1st of the month ... I'm not sure the 
idea is better.
  @ArielGlenn  - How do we proceed to try moving forward (in either direction) ?

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, ArielGlenn, 
JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, 
Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu, Wikidata-bugs, 
aude, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday

2019-03-14 Thread JAllemandou
JAllemandou added a comment.


  In T216160#5020236 <https://phabricator.wikimedia.org/T216160#5020236>, 
@ArielGlenn wrote:
  
  > By Friday I'll have done that; by next Wednesday let's make a decision, 
barring any huge obstacles.
  
  
  Awesome, thanks @ArielGlenn  :)

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, 
ArielGlenn, JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu, 
Wikidata-bugs, aude, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T214897: data for analyzing and visualizing the identifier landscape of Wikidata

2019-03-15 Thread JAllemandou
JAllemandou added a comment.


  Hey @GoranSMilovanovic  - I don't have a good understanding of what you're 
after, but having read pairs and contingency table above, maybe this Spark 
function could be helpful: 
https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/DataFrameStatFunctions.html

TASK DETAIL
  https://phabricator.wikimedia.org/T214897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic, 
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-10-03 Thread JAllemandou
JAllemandou added a comment.


  this is done @GoranSMilovanovic.
  Raw data is here 
`/user/joal/wmf/data/raw/mediawiki/wikidata/all_jsondumps/20190902` and parquet 
data is here `/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20190902`

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: WMDE-leszek, abian, leila, Ottomata, Nuria, GoranSMilovanovic, Addshore, 
JAllemandou, bmansurov, 4748kitoko, darthmon_wmde, DannyS712, Nandana, 
Akovalyov, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, terrrydactyl, 
Wikidata-bugs, aude, Capt_Swing, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T236895: ArticlePlaceholder dashboard stopped tracking page views

2019-10-30 Thread JAllemandou
JAllemandou added a comment.


  I think this problem could be related to T226730 (preventing most 
`Special:XXX` pages to be flagged as pageviews).

TASK DETAIL
  https://phabricator.wikimedia.org/T236895

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, elukey, Addshore, Aklapper, Lydia_Pintscher, 4748kitoko, 
darthmon_wmde, DannyS712, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, 
QZanden, cmadeo, LawExplorer, _jensen, rosalieper, Jonas, terrrydactyl, 
Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-11-27 Thread JAllemandou
JAllemandou added a comment.


  Does this being closed mean we can access data on kafka?

TASK DETAIL
  https://phabricator.wikimedia.org/T101013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Igorkim78, JAllemandou, Ottomata, Smalyshev, Deskana, Aklapper, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
holger.knust, Meekrab2012, joker88john, ET4Eva, DannyS712, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Avner, Lewizho99, 
Maathavan, Gehel, _jensen, rosalieper, Scott_WUaS, Jonas, FloNight, Xmlizer, 
mobrovac, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
GWicke, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239471: Sqoop wikidata terms tables into hadoop

2019-11-29 Thread JAllemandou
JAllemandou added a project: Analytics-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T239471

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, JAllemandou
Cc: JAllemandou, Addshore, Aklapper, 4748kitoko, Hook696, Daryl-TTMG, 
RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez, darthmon_wmde, alaa_wmde, 
Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, 
Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, 
Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, 
LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, 
Jonas, terrrydactyl, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T209655: Copy Wikidata dumps to HDFS

2019-12-04 Thread JAllemandou
JAllemandou added a subscriber: Groceryheist.
JAllemandou added a comment.


  New dataset available @GoranSMilovanovic. Pinging @Groceryheist  as I also 
generated the items per page.
  
hdfs dfs -ls /user/joal/wmf/data/wmf/mediawiki/wikidata_parquet | tail -1
drwxr-xr-x   - analytics joal  0 2019-12-04 18:31 
/user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20191202

hdfs dfs -ls /user/joal/wmf/data/wmf/wikidata/item_page_link/ | tail -1
drwxr-xr-x   - joal joal  0 2019-12-04 18:50 
/user/joal/wmf/data/wmf/wikidata/item_page_link/20191202

TASK DETAIL
  https://phabricator.wikimedia.org/T209655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottomata, Nuria, 
GoranSMilovanovic, Addshore, JAllemandou, bmansurov, 4748kitoko, darthmon_wmde, 
DannyS712, Nandana, Akovalyov, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, terrrydactyl, Wikidata-bugs, aude, Capt_Swing, Mbch331, 
jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T64874: [Story] Statistics for Wikidata exports

2015-11-17 Thread JAllemandou
JAllemandou added a subscriber: JAllemandou.
JAllemandou added a comment.

Hi, quick questions on that:
Is the need regular, or would one shots make it ?
Also, what level of aggregation ? Daily is good ?
Below is a hive request that makes daily aggregation over (so thought) 
interesting dimension.
DISCLAIMER: These request need to scan a BIG volume of data (500Gb per day), so 
let's discuss how to handle the thing if you need regular updates.

  SELECT
  CONCAT(LPAD(year, 4 ,0), '-', LPAD(month, 2, 0), '-', LPAD(day, 2, 0)) as 
day,
  regexp_extract(uri_path, '^/entity/.+(\\..+)$', 1) AS entity_format,
  regexp_extract(uri_path, '^/wiki/Special:EntityData/.+(\\..+)$', 1) AS 
special_entity_format,
  access_method,
  agent_type,
  http_status,
  COUNT(1) as count
  FROM wmf.webrequest
  WHERE webrequest_source = 'text'
  AND year = 2015
  AND month = 11
  AND day = 16
  AND normalized_host.project_class = 'wikidata'
  AND uri_path rlike '^(/entity/|/wiki/Special:EntityData/).*$'
  GROUP BY
  year, month, day,
  access_method,
  agent_type,
  http_status,
  regexp_extract(uri_path, '^/entity/.+(\\..+)$', 1),
  regexp_extract(uri_path, '^/wiki/Special:EntityData/.+(\\..+)$', 1)
  ORDER BY
  day, entity_format, special_entity_format, access_method, agent_type, 
http_status
  LIMIT 10;



| day| entity_format  | special_entity_format | access_method | 
agent_type | http_status | count  |
| 2015-11-16 ||   | desktop   | spider  
   | 200 | 2  |
| 2015-11-16 ||   | desktop   | spider  
   | 301 | 345473 |
| 2015-11-16 ||   | desktop   | spider  
   | 302 | 75 |
| 2015-11-16 ||   | desktop   | spider  
   | 303 | 312186 |
| 2015-11-16 ||   | desktop   | spider  
   | 400 | 21 |
| 2015-11-16 ||   | desktop   | spider  
   | 503 | 2  |
| 2015-11-16 ||   | desktop   | user
   | 200 | 18 |
| 2015-11-16 ||   | desktop   | user
   | 301 | 1398   |
| 2015-11-16 ||   | desktop   | user
   | 302 | 38 |
| 2015-11-16 ||   | desktop   | user
   | 303 | 2714   |
| 2015-11-16 ||   | desktop   | user
   | 400 | 25 |
| 2015-11-16 ||   | desktop   | user
   | 429 | 2  |
| 2015-11-16 || .json | desktop   | spider  
   | 200 | 719297 |
| 2015-11-16 || .json | desktop   | spider  
   | 301 | 501004 |
| 2015-11-16 || .json | desktop   | spider  
   | 304 | 10315  |
| 2015-11-16 || .json | desktop   | spider  
   | 400 | 7  |
| 2015-11-16 || .json | desktop   | spider  
   | 404 | 1777   |
| 2015-11-16 || .json | desktop   | spider  
   | 503 | 4  |
| 2015-11-16 || .json | desktop   | user
   | 200 | 7675   |
| 2015-11-16 || .json | desktop   | user
   | 301 | 97 |
| 2015-11-16 || .json | desktop   | user
   | 302 | 10 |
| 2015-11-16 || .json | desktop   | user
   | 304 | 1017   |
| 2015-11-16 || .json | desktop   | user
   | 400 | 1  |
| 2015-11-16 || .json | desktop   | user
   | 404 | 2  |
| 2015-11-16 || .json | desktop   | user
   | 429 | 42 |
| 2015-11-16 || .n3   | desktop   | spider  
   | 200 | 65982  |
| 2015-11-16 || .n3   | desktop   | spider  
   | 301 | 1952   |
| 2015-11-16 || .n3   | desktop   | spider  
   | 304 | 17417  |
| 2015-11-16 || .n3   | desktop   | spider  
   | 404 | 13 |
| 2015-11-16 || .n3   | desktop   | spider  
   | 503 | 1  |
| 2015-11-16 || .n3   | desktop   | user
   | 200 | 169|
| 2015-11-16 |

[Wikidata-bugs] [Maniphest] [Updated] T64874: [Story] Statistics for Special:EntityData usage

2015-11-19 Thread JAllemandou
JAllemandou added a project: Analytics-Backlog.

TASK DETAIL
  https://phabricator.wikimedia.org/T64874

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Halfak, hoo, Addshore, Ricordisamoa, Aklapper, drdee, Tnegrin, 
QChris, ezachte, Lydia_Pintscher, daniel, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Retitled] T119054: Investigate wikidata pageview sipke on 2015-11-14

2015-11-19 Thread JAllemandou
JAllemandou changed the title from "Remove query.wikidata.org from pageview 
definition (for wikidata)" to "Investigate wikidata pageview sipke on 
2015-11-14".
JAllemandou set Security to None.

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, 
Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Retitled] T119054: Fix '.*http.*' not being tagged as spiders in webrequest

2015-11-19 Thread JAllemandou
JAllemandou changed the title from "Investigate wikidata pageview sipke on 
2015-11-14" to "Fix '.*http.*' not being tagged as spiders in webrequest".
JAllemandou triaged this task as "Unbreak Now!" priority.
JAllemandou claimed this task.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics-Backlog.

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, 
Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T119054: Fix '.*http.*' not being tagged as spiders in webrequest

2015-11-19 Thread JAllemandou
JAllemandou moved this task to In Progress on the Analytics-Kanban workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1030/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, 
Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T119054: Fix '.*http.*' not being tagged as spiders in webrequest

2015-11-19 Thread JAllemandou
JAllemandou added a comment.

I messed up a deploy about a month ago, preventing the change merged here: 
https://gerrit.wikimedia.org/r/#/c/244465/ to actually being applied.
I will:

- bump refinery-core and refinery-hive (> 0.0.19) and update refine oozie job
- deploy refinery with these new jar and new refine
- restart refine process
- document (wikitech webrequest, research pageview)


TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, 
Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Retitled] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-19 Thread JAllemandou
JAllemandou changed the title from "Fix '.*http.*' not being tagged as spiders 
in webrequest" to "Fix '.*http.*' not being tagged as spiders in webrequest [5 
pts] {hawk}".

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-19 Thread JAllemandou
JAllemandou moved this task to Ready to Deploy on the Analytics-Kanban 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1030/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-19 Thread JAllemandou
JAllemandou moved this task to Done on the Analytics-Kanban workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T119054

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1030/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-23 Thread JAllemandou
JAllemandou added a comment.

@Addshore: Not feasible since original user_agent is not present in 
pageview_hourly.


TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Tbayer, gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-23 Thread JAllemandou
JAllemandou added a comment.

@ Addshore: The A are notes (there is a card if you place your mouse over it), 
and there is a note at deploy when the drop occurs.
Is there a necessity to add another? If you think so, notes are created using 
wiki: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations.


TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Tbayer, gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-23 Thread JAllemandou
JAllemandou added a comment.

Notes are to the dashiki page, but I think you can modify the existing ones if 
you wish :)


TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Tbayer, gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T119054: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}

2015-11-23 Thread JAllemandou
JAllemandou added a comment.

Thanks :)


TASK DETAIL
  https://phabricator.wikimedia.org/T119054

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Tbayer, gerritbot, Lydia_Pintscher, Aklapper, Addshore, StudiesWorld, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-05-27 Thread JAllemandou
JAllemandou added a comment.


  An idea: How about sending back to kafka the update stream and make THAT one 
retention higher?
  Moving retention to 30 days for revision-create will make a lot of data stay 
that wouldn't be necessary (about half of the data), while keeping only the 
updates should be enough.
  Just an idea :)

TASK DETAIL
  https://phabricator.wikimedia.org/T253753

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Ottomata, dcausse, Aklapper, CBogen, 4748kitoko, 
darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T249319: Remove wb_terms from sqoop

2020-06-02 Thread JAllemandou
JAllemandou closed this task as "Resolved".
JAllemandou updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T249319

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Milimetric, Aklapper, Addshore, 4748kitoko, Iflorez, darthmon_wmde, 
alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, JAllemandou, terrrydactyl, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-14 Thread JAllemandou
JAllemandou added a comment.


  > First step: analyze the frequency distribution of the user_agent field 
(string) from wmf.webrequest where queries are SPARQL.
  
  I suggest you use events instead fo webrequest:  
`event.wdqs_internal_sparql_query` and `event.wdqs_external_sparql_query`.
  
  I have done some work emcompassing user-agent frequency analysis and I 'm in 
the process of writing the findings for this end of week.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread JAllemandou
JAllemandou added a comment.


SELECT
http.request_headers['user-agent'],
user_agent_map,
count(1) as c
FROM event.wdqs_external_sparql_query
WHERE year = 2020 and month = 5 and day = 1
GROUP BY
http.request_headers['user-agent'],
user_agent_map
ORDER BY c DESC
LIMIT 100;

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I finally published a wiki page with most of the results I 
found: https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Traffic_Analysis
  Sorry for the delay ...

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I have indeed done some analysis using Apache Jena parser 
to extract algebraic representation of queries. Not yet to the level of 
completion I like though. I'll be on holidays until August 15th starting 
tonight - let's discuss when I come back?

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-05 Thread JAllemandou
JAllemandou added a comment.
@Nuria , @Smalyshev : Given all wikidata-query tagged rows  belong in misc, which is super small, I have no objection running jobs either hourly or daily.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: JAllemandou, mpopov, mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T201168: [Trailblaze] Use apache mahout item recommender for property suggestions

2018-08-24 Thread JAllemandou
JAllemandou added a comment.
Hi @Jonas - A quick comment as per a quick chat with @Addshore on IRC. If you want to implement recommandation based on collaborative filtering for instance, I suggest you go for Spark MLLib (Spark Machine Learning LIBrary). It has all the classical ML algorithms, including collaborative filtering (https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html), and has currently a lot more traction than Mahout (the 'old' way). Let's talk when you want :)TASK DETAILhttps://phabricator.wikimedia.org/T201168EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: JAllemandou, Addshore, Jonas, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T193641: track number of editors from other Wikimedia projects who also edit on Wikidata over time

2018-09-25 Thread JAllemandou
JAllemandou added a comment.
Jobs have been successful for the past months. However rerunning the jobs manually made the data-points appear. This is very bizarre.
Let's keep this open and monitor next month.TASK DETAILhttps://phabricator.wikimedia.org/T193641EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, JAllemandouCc: Liuxinyu970226, Tbayer, Aklapper, GerritBot, JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Lydia_Pintscher, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261937: Add CPU load and query concurrency as context to event logging from WDQS

2020-09-07 Thread JAllemandou
JAllemandou added a comment.


  Will make it a lot easier to analyze than to have to build the 'in-flight' 
view of queries!

TASK DETAIL
  https://phabricator.wikimedia.org/T261937

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, Aklapper, Gehel, CBogen, Akuckartz, darthmon_wmde, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T258269: Add query result to the current WDQS event logging

2020-09-07 Thread JAllemandou
JAllemandou added a comment.


  In term of logging-size, it probably depends on the result type: in case of 
descriptions or other text-heavy fields, this could get bigger if high or no 
`LIMIT` are set in the number of returned rows. We should set a limit :)

TASK DETAIL
  https://phabricator.wikimedia.org/T258269

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: JAllemandou, GoranSMilovanovic, Gehel, Aklapper, CBogen, Akuckartz, 
darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-02 Thread JAllemandou
JAllemandou added a comment.


  Heya - I'm sorry I completely missed the ping :S
  Quick analysis:
  
spark.sql("SELECT (http.request_headers['referer'] IS NOT NULL) as 
defined_referer, count(1) as c from event.wdqs_external_sparql_query where year 
= 2020 and month = 9 group by (http.request_headers['referer'] IS NOT NULL) 
limit 100").show(100, false)
+---+-+ 

|defined_referer|c|
+---+-+
|false  |165201676|
|true   |5613278  |
+---+-+
  
  --> 3.3% of requests have referer defined for September
  
  Among those 3.3%, here is the top 10:
  
spark.sql("SELECT http.request_headers['referer'] as referer, count(1) as c 
from event.wdqs_external_sparql_query where year = 2020 and month = 9 and 
http.request_headers['referer'] IS NOT NULL group by 
http.request_headers['referer'] order by c desc limit 10").show(10, false)
+-+---+ 

|referer  |c  |
+-+---+
|https://query.wikidata.org/  |2730003|
|https://labs.minutelabs.io/Tree-of-Life-Explorer/|307426 |
|https://www.wikidata.org/|212431 |
|https://labs.minutelabs.io/  |138757 |
|https://ru.wikipedia.org/|107558 |
|https://query.wikidata.org/embed.html|102165 |
|https://wlmuk.toolforge.org/ |96946  |
|https://maps.wikilovesmonuments.org/ |89894  |
|https://wikishootme.toolforge.org/   |87632  |
|https://en.wikipedia.org/|62147  |
+-+---+
  
  --> Using headers over a month, https://query.wikidata.org/ queries represent 
1.6% of queries.
  
  Having 3.3% of referer seems small. If someone with better gut-feeling of 
that could chime-in that's be great, otherwise I'm gonna try to do more 
advanced user-agent analysis on the data and try to judge if it feels organix 
or not.

TASK DETAIL
  https://phabricator.wikimedia.org/T261841

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-06 Thread JAllemandou
JAllemandou added a comment.


  I continued my analysis today looking at top-100 parsed user-agents from both 
queries-with-referer subset, and queries-without-referer subset, over the month 
of September.
  See https://phabricator.wikimedia.org/P12933
  
  - The queries-with-referer have a defined user-agent. meaning that the 
user-agent-parser we use to extract structured information from the user-agent 
line provides values for a lot of its fields. By looking at the top-100 
user-agents we actually cover more than 90% of requests made with referer
  - The queries-without-referer have either an undefined or `Spider` 
user-agent, meaning that the user-agent line is either not parseable or is 
parsed as a bot. I inspected manually the user-agent lines and confirm that 
most of the user-agent lines looks like bots (particularly the ones making most 
requests).  By looking at the top 100 user-agents we also cover more than 90% 
of requests made without referer.
  
  This confirms that, despite being small, the requests providing a referer 
seems trustworthy. There is therefore nothing more to for this task, data is 
already available.

TASK DETAIL
  https://phabricator.wikimedia.org/T261841

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261841: Tag WDQS query log with the source of the query (UI vs direct access)

2020-10-16 Thread JAllemandou
JAllemandou added a comment.


  Some more info on this aspect: I have done a quick analysis over September 
queries today and found that my assumption that long queries were made by users 
from UI is wrong.
  
  First, total numbers of request and sum of query-time split by queries taking 
more than 1s or less:
  
+---+-+---+
|more_1s|requests |query_time |
+---+-+---+
|false  |160185762|11285161245|
|true   |2757758  |22233005459|
+---+-+---+
  
  The proportions of number of queries per time classes are the same whether a 
referer is present (expected UI) or not (expected bot).
  
+---++-+---+
|has_referer|query_time_class|count|query_time |
+---++-+---+
|false  |1_less_10ms |8613461  |43244699   |
|false  |2_10ms_to_100ms |118036102|3382186064 |
|false  |3_100ms_to_1s   |28377288 |7058252741 |
|false  |4_1s_to_10s |1815394  |6081683264 |
|false  |5_more_10s  |591957   |14313410554|
|true   |1_less_10ms |24329|133314 |
|true   |2_10ms_to_100ms |3123534  |140796917  |
|true   |3_100ms_to_1s   |2011048  |660547510  |
|true   |4_1s_to_10s |310037   |800937814  |
|true   |5_more_10s  |40370|1036973827 |
+---++-+---+
  
  Below are some information on the top-100 user-agents/referer making most 
requests with duration greatest than 1s:
  

+-+-++--++--+
|user_agent 
  |referer  

|requests_more_1s|query_time_more_1s|requests_less_1s|query_time_less_1s|

+-+-++--++--+
|ChemAxon-Marvin/20.15.0
  |null 
|209930  |1816992218|0   |0 
|
|SAP/1.0
  |null 
|143198  |248447172 |5100562 
|1834622022|
|okhttp/4.0.0-alpha02   
  |null 
|128509  |552415825 |2967
|679104|
|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 
(KHTML\, like Gecko) Chrome/50.0.2661.102 Safari/537.36   |null 
|114446  |467381034 |5899288
 |441228641 |
|commonscat_copy_from_P373 Pywikibot/3.1.dev0 (g6) requests/2.22.0 
Python/2.7.13.final.0  
|null |99342   |404639751 |5907 
   |4823011   |
|sparqlwrapper 1.8.2 (rdflib.github.io/sparqlwrapper)   
  |null 
|86830   |2843969537|289618  
|45768268  |
|Apache-HttpClient/4.5.10 (Java/1.8.0_242)  
  |null 
|70949   |1331327131|3127
|2242093   |
|bbw-bot
  |null 
|68936   |292089223 |1957742 
|234481876 |
|MyCoolTool/0.1 dlworb1...@yonsei.ac.kr 
  |null 
|52715   |149532780 |275170  
|34374373  |
|python-requests/2.24.0 
  |null 
|49917   |458021897 |364755  
|33714974  |
|Drupal

[Wikidata-bugs] [Maniphest] T266022: Programmatically categorize WDQS queries by potential alternative solution

2021-01-04 Thread JAllemandou
JAllemandou added a comment.


  Planned deadline was end of last month. I've gone through various issues 
preventing to achieve it. I'have started the actual work today (I gave it 
thought but didn't code) and wish to present results before the end of the 
month.

TASK DETAIL
  https://phabricator.wikimedia.org/T266022

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: dcausse, JMinor, Aklapper, Gehel, Addshore, JAllemandou, Lydia_Pintscher, 
CBogen, MPhamWMF, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T130102: [Task] dashboard showing browser usage distribution for Wikidata

2016-04-11 Thread JAllemandou
JAllemandou moved this task to Radar on the Analytics workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T130102

WORKBOARD
  https://phabricator.wikimedia.org/project/board/11/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Nuria, Addshore, Aklapper, Lydia_Pintscher, D3r1ck01, Izno, JAllemandou, 
Wikidata-bugs, aude, Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T135164: Pageview API not reporting spiders correctly

2016-05-12 Thread JAllemandou
JAllemandou added a comment.


  Results look correct to me with that query:
  
SELECT
agent_type,
count(1) as count
FROM
  webrequest
WHERE
  year = 2016
  AND month = 5
  AND day = 10
  AND uri_host LIKE "%wikidata.org"
  AND is_pageview
  AND pageview_info['page_title'] = "Special:RecentChangesLinked"
GROUP BY agent_type
ORDER BY count
LIMIT 99;
--
agent_type  count
spider  72
user91328
  
  What has changed:
  
  - include wikidata.org as well as www.wikidata.org
  - filter for pageview only (instead of all requests)
  - filter pageview title instead of uri (there are multiple ways to query the 
same page with mediawiki).

TASK DETAIL
  https://phabricator.wikimedia.org/T135164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Lydia_Pintscher, JAllemandou, Aklapper, madhuvishy, Addshore, Zppix, 
D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T135164: "egranary digital library system" UA should be listed as a spider

2016-05-13 Thread JAllemandou
JAllemandou added a comment.


  @Tbayer : I suggested @Addshore to request webrequest on a specific hour for 
detailed user_agent analysis.
  For this check @Addshore, I would really have gone for ONE HOUR of data, 
making the volume of data to work real smaller (data is partitionned up to 
hour).
  As for using the pageview_hourly for double checking number, this is the 
first thing I did, and I have finally double checked everything was ok on 
webrequest.

TASK DETAIL
  https://phabricator.wikimedia.org/T135164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: Tbayer, Lydia_Pintscher, JAllemandou, Aklapper, madhuvishy, Addshore, 
Zppix, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T177257: ArticlePlaceholder hit counts from bnwiki seem bogus

2017-10-30 Thread JAllemandou
JAllemandou added a comment.
Hi folks,
Not a bug for me:

SELECT access_method, count(1) from wmf.webrequest WHERE is_pageview AND pageview_info['project'] = 'bn.wikipedia' AND year = 2017 AND month = 9 AND day = 30 AND webrequest_source = 'text' AND x_analytics_map['ns'] = '-1' AND x_analytics_map['special'] = 'AboutTopic' group by access_method;

gives result:

access_method	_c1
desktop	13
mobile web	448

which is coherent with the request @Addshore did that doesn't include bn.m.wikipedia.org.
I let you either close or ping us back :)TASK DETAILhttps://phabricator.wikimedia.org/T177257EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, JAllemandouCc: JAllemandou, Aklapper, Lucie, Lydia_Pintscher, Nuria, Addshore, hoo, Lahi, GoranSMilovanovic, QZanden, cmadeo, Wikidata-bugs, aude, jayvdb, Ricordisamoa, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T160825: Grafana: "wikidata-api" doesn't update anymore

2017-03-20 Thread JAllemandou
JAllemandou added a comment.
Just had a quick look at oozie jobs, and they seem successfull.
Let's trouble that with @Addshore.TASK DETAILhttps://phabricator.wikimedia.org/T160825EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: JAllemandou, Nuria, Lydia_Pintscher, Addshore, matej_suchanek, Aklapper, QZanden, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-12-15 Thread JAllemandou
JAllemandou added a project: Data-Engineering-Kanban.

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 786, EChetty, 
Suran38, Biggs657, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
4748kitoko, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Akovalyov, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2021-12-16 Thread JAllemandou
JAllemandou added a comment.


  Code is ready:
  
  - Import `commons-mediainfo` json dumps to HDFS 
(https://gerrit.wikimedia.org/r/738874)
  - Update spark transformation job to work with both wikidata and commons 
dumps (https://gerrit.wikimedia.org/r/739129)
  - Update `wikidata_entity` table creation script and oozie job for the new 
fields added by the patch above 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/+/740589)
  - Add `commons_entoty` table creation script 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/+/740590)
  - Update spark transformation job to write directly to a hive table instead 
of to files 
(https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/747508/)
  
  What we need after having merged/deployed the above is:
  
  - A new airflow job for the `commons_entity` data genration
  - A migration of the `wikidata_entity` oozie job to Airflow

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 786, EChetty, 
Suran38, Biggs657, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
4748kitoko, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Akovalyov, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299059: Write an Airflow job converting commons structured data dump to Hive

2022-01-12 Thread JAllemandou
JAllemandou created this task.
JAllemandou added projects: Product-Analytics, Structured-Data-Backlog, 
Wikidata-Query-Service, Wikidata, Data-Engineering, Discovery-Search (Current 
work), Patch-For-Review, Data-Engineering-Kanban.
Restricted Application removed a project: Patch-For-Review.

TASK DESCRIPTION
  The airflow job should
  
  - be run weekly on Mondays.
  - wait for source data to be available:
- source folder is of form 
`hdfs://analytics-hadoop/wmf/data/raw/commons/dumps/mediainfo-json/MMDD`
- source folder contains a file named `_IMPORTED` when the source data has 
been succesfully imported in the folder
  - run a spark job reading the source data and writing it to hive
- the spark job is in the `refinery-job.jar` archive, we need to have it as 
a dependency for the job
- the spark job class is 
`org.wikimedia.analytics.refinery.job.structureddata.jsonparse.JsonDumpConverter`
- main parameters of the job are the input folder, the output hive table 
and the snapshot (time partition) being created. The output hive table will be 
`structured_data.commons_entity` and the `snapshot` will be in the form 
`-MM-DD`. See the class for the detailed list of parameters :)

TASK DETAIL
  https://phabricator.wikimedia.org/T299059

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: nettrom_WMF, Miriam, Nuria, cchen, AKhatun_WMF, JAllemandou, ntsako, 
EChetty, toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-03-07 Thread JAllemandou
JAllemandou added a comment.


  Thank you for letting us know :)

TASK DETAIL
  https://phabricator.wikimedia.org/T300240

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, karapayneWMDE, 
Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T252443: Create dashboard to show growth of structured data on Commons over time

2022-03-30 Thread JAllemandou
JAllemandou closed subtask T258834: Create a Commons equivalent of the 
wikidata_entity table in the Data Lake as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T252443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: cchen, JAllemandou
Cc: kzimmerman, nettrom_WMF, GFontenelle_WMF, Abit, Ramsey-WMF, CBogen, 
Astuthiodit_1, EChetty, karapayneWMDE, toberto, ldelench_wmf, Invadibot, 
maantietaja, Y.ssk, FRomeo_WMF, Muchiri124, ItamarWMDE, Nintendofan885, 
Akuckartz, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Cparle, SandraF_WMF, 
GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, 
_jensen, rosalieper, 4nn1l2, Taiwania_Justo, Scott_WUaS, Susannaanas, 
Ixocactus, Wong128hk, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, 
El_Grafo, Dinoguy1000, Ricordisamoa, Wesalius, Lydia_Pintscher, Raymond, 
Steinsplitter, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2022-03-30 Thread JAllemandou
JAllemandou closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 
Fernandobacasegua34, Astuthiodit_1, ntsako, 786, EChetty, Suran38, Biggs657, 
karapayneWMDE, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T299059: Write an Airflow job converting commons structured data dump to Hive

2022-04-08 Thread JAllemandou
JAllemandou closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T299059

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Snwachukwu, JAllemandou
Cc: Cparle, nettrom_WMF, Miriam, Nuria, cchen, AKhatun_WMF, JAllemandou, 
Astuthiodit_1, ntsako, EChetty, karapayneWMDE, toberto, ldelench_wmf, 
Invadibot, MPhamWMF, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake

2022-04-08 Thread JAllemandou
JAllemandou closed subtask T299059: Write an Airflow job converting commons 
structured data dump to Hive as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T258834

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, 
Fernandobacasegua34, Astuthiodit_1, ntsako, 786, EChetty, Suran38, Biggs657, 
karapayneWMDE, toberto, ldelench_wmf, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-19 Thread JAllemandou
JAllemandou added a comment.


  Hi Folks - What is the status on this one?
  
  I'd like Data-Engineering to announce the deprecation of Spark2 for this end 
of month, but not without knowing how we plan on tackling your job :)
  Here are the 2 possible solutions I can think of:
  
  - Stopping the job while it is revamped to spark3 (Knowing that the dashboard 
is broken, is it a possible solution?)
  - Configure the job not to use DynamicAllocation but to use fixed-resource, 
making the job work in spark2 despite spark2 being deprecated, but using more 
cluster resources than really needed
  - Postpone deprecating spark2 (if we could not do that, I'd be super happy :)
  
  Let me know your thoughts :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-19 Thread JAllemandou
JAllemandou added a comment.


  In T334951#8946790 <https://phabricator.wikimedia.org/T334951#8946790>, 
@AndrewTavis_WMDE wrote:
  
  > I'll async with him now and see if we can come to a decision sooner than 
that, but you all will have the answer by Wednesday at the latest 😊
  
  Awesome, thank you :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-22 Thread JAllemandou
JAllemandou added a comment.


  In T334951#8952583 <https://phabricator.wikimedia.org/T334951#8952583>, 
@AndrewTavis_WMDE wrote:
  
  > - If the answer to the above question of permanently losing some data 
that's being produced by Concepts Monitor and other WMDE jobs is no, then we're 
ok with option one above of stopping the job.
  
  I am not knowledgeable at all about the data generated by the  job 
unfortunately, preventing me to assess whether there is data generated by the 
job that we would not be able to regenerate.
  Also, I have not been told about intermediary data stored on the cluster, 
making me think that all the data generated by the job is small enough to be 
saved for the reports only.
  But as stated befoe, those are  uninformed ideas :(
  
  > - Aside from this we'd prefer option two of configuring it to use 
fixed-resource.
  
  We can test that :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-06-30 Thread JAllemandou
JAllemandou added a comment.


  Hi @AndrewTavis_WMDE,
  I've done some investigation, and here is what I have: Goran has 11 CRON jobs 
running from various hosts on our system (1on `stat1004`, 2 on `stat1007`, 7 on 
`stat1008`).
  
  - `WDCM_Sqoop_Clients` runs on`stat1004` weekly - It doesn't run spark (but 
Sqoop)
  - `2021_WMDE_Mitmachen_Bereich_2021_Campaign` runs on `stat1007` daily -  It 
doesn't run spark (but Hive)
  - `WD_PageviewsPerType` runs on `stat1007` daily but has been failing since 
February 17th - It runs a spark job
  - `WD_UsageCoverage` runs on `stat1008` daily - It runs a spark job
  - `WD_languagesLandscape` runs on `stat1008` monthly (30th of the month) - It 
runs a spark job
  - `Wiktionary_CognateDashboard` runs on `stat1008` daily - It doesn't run 
spark
  - `WDCM_EngineBiases` runs on `stat1008` weekly - It runs a spark job
  - `Qurator_CuriousFacts` runs on `stat1008` monthly (10th of the month) - It 
runs a spark job
  - `WMDE_BannerImpressions` runs on `stat1008` hourly - It doesn't runspark 
(but Hive)
  - `NewEditors_comprehensive_report` runs on `stat1008` daily - It runs a 
spark job
  
  We need to meet and talk about your usage of the data generated by those 
scripts, and see what you wish us to try to make work versus stop.
  I'm booking some time on your calendar next Monday :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, EChetty, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-07-03 Thread JAllemandou
JAllemandou added a comment.


  We met this morning with @AndrewTavis_WMDE and @Manuel - Thank you folks for 
the great meeting.
  The detailed Meeting notes are here: 
https://docs.google.com/document/d/1REsolXnZf2KqApL0p-DE8X4eWXI_zxHgrCe3k1hcZnw
  
  From the job list in previous comment:
  
  - 4 don't run spark andare kept as-is: `WMDE_BannerImpressions`, 
`Wiktionary_CognateDashboard`, `2021_WMDE_Mitmachen_Bereich_2021_Campaign`, 
`WDCM_Sqoop_Clients`)
  - 3 are stopped (crontaab commented): `Qurator_CuriousFacts`, 
`WDCM_EngineBiases`, `WD_PageviewsPerType`
  - 3 have been updated to run spark2 in fixed-resource mode, thus normally not 
failing after the migration to the spark3-shuffler: `WD_UsageCoverage`, 
`WD_languagesLandscape`, `NewEditors_comprehensive_report`
  
  With those changes there is no more blocker in migrating to the 
spark3-shuffler from this task :)

TASK DETAIL
  https://phabricator.wikimedia.org/T334951

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: ItamarWMDE, BTullis, GoranSMilovanovic, AndrewTavis_WMDE, Aklapper, Manuel, 
JAllemandou, lbowmaker, xcollazo, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-18 Thread JAllemandou
JAllemandou added a comment.


  In T342416#9091146 <https://phabricator.wikimedia.org/T342416#9091146>, 
@EBernhardson wrote:
  
  > I looked into these, the attached patch should fix it but it leaves an open 
question (@JAllemandou):
  >
  > The `core-site.xml`, along with puppet which writes it out, has the default 
umask of 027 since at least 2021, which prevents world readability. So why do 
we have the following permissions for historical dumps:
  >
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230710
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230716
  >   drwxr-xr-x   /wmf/data/discovery/wikidata/rdf/date=20230717
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230723
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230724
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230730
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230731
  >   drwxr-x---   /wmf/data/discovery/wikidata/rdf/date=20230806
  
  The world-readable change were manually made by myself to unblock 
@AndrewTavis_WMDE  - I logged my change in the analytics IRC chan but didn't 
ping on the search IRC chan - I should have, please excuse me on this :)
  
  > Similarly we have other jobs that still run today and emit world readable 
dumps without explicitly setting the umask, what is causing the difference?
  >
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230716
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230723
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230730
  >   drwxrwxr-x   
/wmf/data/discovery/cirrus/index/cirrus_replica=codfw/cirrus_group=chi/wiki=enwiki/snapshot=20230806
  
  The guess I have about those would be that they are still generated by a Hive 
job. Hive and spark behave differently in regard to permissions when generating 
files. Spark uses the configured umask, while hive reproduces the parent-dir 
patten. I'd be interested to be sure if my guess is correct :)

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-18 Thread JAllemandou
JAllemandou added a comment.


  In T342416#9101868 <https://phabricator.wikimedia.org/T342416#9101868>, 
@EBernhardson wrote:
  
  > These are both generated by spark.  The rdf is being imported by a scala 
application while the cirrus dump is imported by pyspark, but they should both 
be using the same underlying implementation. Both applications use 
`df.write.insertInto(table_name)` to instruct spark to do the actual output. 
I'm a bit surprised they end up generating different sets of permissions.
  >
  > I suppose it's not super important why the cirrus dump is world readable, 
it's fine to be readable, it just hints to me that there is something I don't 
understand about hdfs/spark/permissions happening here.
  
  Mwarf, wrong guess :) Interesting nonetheless - Let me know if you wish we 
pair on this.

TASK DETAIL
  https://phabricator.wikimedia.org/T342416

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, 
Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, AWesterinen, lbowmaker, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T336361: [Analytics] Identify access from mobile vs. desktop devices

2023-09-07 Thread JAllemandou
JAllemandou added a comment.


  > However, my assumption is that when only filtering for agent_type != 
'spider' the population will still include a lot of non-UI hits.
  
  The `agent_type` field currently can take 3 values: `spider`, `automated` and 
`user`. The `spider` one is used when user-agents self define themselves as 
bots, the `automated` one is used when we heuristically define the traffic as 
being automatically generated (big volume), and the rest falls under the `user` 
value. There indeed still is some non-user traffic being flagged as `user`.

TASK DETAIL
  https://phabricator.wikimedia.org/T336361

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, JAllemandou
Cc: JAllemandou, AndrewTavis_WMDE, Michael, Manuel, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-13 Thread JAllemandou
JAllemandou added a comment.


  Thanks a lot @EBernhardson for the help on finishing this!

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-08-26 Thread JAllemandou
JAllemandou added a comment.


  In T303831#8175252 <https://phabricator.wikimedia.org/T303831#8175252>, 
@EBernhardson wrote:
  
  > @JAllemandou  The one remaining piece of this ticket is cleaning up the 
historical data, per T303831#8081172 
<https://phabricator.wikimedia.org/T303831#8081172>.  Any suggestions on how we 
should manage droping old data from tables partitioned by a snapshot column?
  
  The we currently do this is with this script: 
https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots
  it works differently from the generic `refinery-drop-older-than` script, in 
that it lists all the datasets to clean and then applies the deletion.
  It's possible to add the datasets you need to delete in there, it shouldn't 
be complicated.

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, 
karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, 
Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-09-14 Thread JAllemandou
JAllemandou added a comment.


  In T303831#8237323 <https://phabricator.wikimedia.org/T303831#8237323>, 
@EBernhardson wrote:
  
  > data cleanup looks to now have run successfully
  
  Thanks a lot @EBernhardson for finalizing on this :)

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, JAllemandou
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Jersione, 
Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, 
karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, 
Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread JAllemandou
JAllemandou added a comment.


  I would suggest using the `hdfs-rsync` tool to do this - it requires some 
setting up with puppet, but it is helpful, through copying only new stuff from 
folders (see 
https://github.com/wikimedia/operations-puppet/blob/1c4d67ff19372832484f7551dc49836be5806024/modules/hdfs_tools/manifests/hdfs_rsync_job.pp
 and 
https://github.com/wikimedia/operations-puppet/blob/1c4d67ff19372832484f7551dc49836be5806024/modules/dumps/manifests/web/fetches/stats.pp)

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread JAllemandou
JAllemandou added a comment.


  No objection :) I'd have gone for option 1 as it seems the easiest to 
maintain, but I agree, it means installing some stuff to the blazegraph 
machines.

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, JAllemandou
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


  1   2   >