NealMcB added a comment. I agree with @Halfak that many of the big XML dumps are very difficult to use, and a single-line JSON format could be easily parallelized by users and would be much more convenient to parse in modern languages. They should also be compressed with bz2, not gz, as noted in https://phabricator.wikimedia.org/T115222.
I would second that work as a good high-priority starting point, as suggested by @RobLa-WMF What was the outcome of the Unconference meeting? TASK DETAIL https://phabricator.wikimedia.org/T114019 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, NealMcB Cc: NealMcB, jcrespo, Bianjiang, madhuvishy, Milimetric, RobLa-WMF, GWicke, TTO, zhuyifei1999, StudiesWorld, gnosygnu, LA2, Ladsgroup, intracer, Lokal_Profil, Halfak, Legoktm, Qgil, JanZerebecki, brion, daniel, Hydriz, MZMcBride, hoo, ezachte, wpmirrordev, Nemo_bis, Aklapper, ArielGlenn, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
