ArielGlenn added a comment.

  Summarizing an IRC conversation with @JAllemandou :
  
  - Analytics ingests full revision history xml dumps from the run on the 1st 
of the month; these correspond to the xml revision metadata ('stub') files 
which are generated on the 1st or 2nd of the month dependingon how long they 
take to complete. While some revisions may have been hidden or deleted, 
generally the revsion history files represent the content state on the 1st or 
2nd.
  - Analytics also gathers other data around the 1st of the month that is 
correlated with the revision content.
  - Analytics ingests wikidata entity dumps in json format, and would like that 
data to be as close to the 1st of the month as possible. This means the 
wikidata-YYYYMMDD-all.json.bz2 (or gz) files specifically.
  
  In his own words, "I want to productionize joining wikidata json dumps to 
other datasources, and matching dates at the month level is really what I 'm 
after"

TASK DETAIL
  https://phabricator.wikimedia.org/T216160

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore, 
ArielGlenn, JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu, 
Wikidata-bugs, aude, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to