ArielGlenn added a comment.
Summarizing an IRC conversation with @JAllemandou :
- Analytics ingests full revision history xml dumps from the run on the 1st
of the month; these correspond to the xml revision metadata ('stub') files
which are generated on the 1st or 2nd of the month dependingon how long they
take to complete. While some revisions may have been hidden or deleted,
generally the revsion history files represent the content state on the 1st or
2nd.
- Analytics also gathers other data around the 1st of the month that is
correlated with the revision content.
- Analytics ingests wikidata entity dumps in json format, and would like that
data to be as close to the 1st of the month as possible. This means the
wikidata-YYYYMMDD-all.json.bz2 (or gz) files specifically.
In his own words, "I want to productionize joining wikidata json dumps to
other datasources, and matching dates at the month level is really what I 'm
after"
TASK DETAIL
https://phabricator.wikimedia.org/T216160
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: notconfusing, Envlh, Melderick, Nicolastorzec, hoo, Smalyshev, Addshore,
ArielGlenn, JAllemandou, alaa_wmde, Nandana, Akovalyov, Lahi, Gq86,
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, gnosygnu,
Wikidata-bugs, aude, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs