GoranSMilovanovic added a comment.

  @Jan_Dittrich @Lydia_Pintscher
  
  - the computation of the Hoover inequality index will be run every time a new 
snapshot of wmf.mediawiki_history 
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history>
 is detected,
  - in Pyspark for ETL 
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/blob/master/_engines/WD_Inequality_ETL.py>,
 orhestrated by a Python script 
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/blob/master/_engines/WD_Inequality_Update.py>
 that checks the snapshot, runs the ETL, and computes the index;
  - the data will be served as a `.csv` file from the public directory 
<https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_Inequality/>,
  - and the future dashboard will be client-side dependent and use the public 
dataset to visualize the results.
  
  I do not think that it makes sense to start working on the super-simple 
dashboard for this now, since we only have the results for the `2021-01` 
snapshot of the `wmf.mediawiki_history` table (which means that we have exactly 
three numbers to visualize). My suggestions is to wait for the next update 
(which is happening at some point in March 2021) and then serve the results on 
a dashboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T270109

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Lydia_Pintscher, WMDE-leszek, Jan_Dittrich, GoranSMilovanovic, Aklapper, 
guergana.tzatchkova, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to