hoo created this task.
hoo added projects: Datasets-General-or-Unknown, Wikidata.

TASK DESCRIPTION

Given Wikidata is currently grows at 3-10% a week, we need to make the Wikidata entity dumpers keep up with that.

The changes in batch size (4eedfb48e9fdc93eea13d9fd3bd341e66c1abfbc) and https://github.com/wmde/WikibaseDataModel/pull/762 will already ease some of the pain, but given the immense growth, this can probably hardly offset four weeks of Wikidata growth.

Possible things to do:

  • Create a "master dump" (or some such) which all other dumps can be derived from (this will ease the pain on the DBs, but hardly considering CPU time)
  • Increase the number of runners further (from 5 currently)
  • Try to derive old dumps from new ones (not quite easy to do and not sure how much to gain here)
  • Do more profiling and try to find more low-hanging fruits (like the examples above, or T157013)
  • Switch away from PHP5 to PHP7 or HHVM

TASK DETAIL
https://phabricator.wikimedia.org/T177486

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, ezachte, daniel, Lydia_Pintscher, mark, ArielGlenn, bd808, Liuxinyu970226, aude, JanZerebecki, Jimkont, Denis.bykov, Ricordisamoa, PokestarFan, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, Svick, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to