hoo created this task.
hoo added projects: Datasets-General-or-Unknown, Wikidata.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION

We should only dump up to N entities in each maintenance script run, and then start a new dumper instance at that offset.

This has several benefits:

  1. If a script fails, we just need to redo the last N entity batch and not the whole thing.
  2. We can (with some grace time) nicely react in case a DB etc. goes down/ changes (and even if no grace time is given, 1. helps here)
  3. All shards will be equally fast (because they will switch DB replicas/ external storage replicas throughout, so picking a slower one at some point doesn't have as much effect)
  4. Memory leaks and other long-running PHP with MediaWiki things don't bite us as hard

I suggest to pick N so that a dumper runs for about 15-30m, before exiting and handing over to the next runner.


TASK DETAIL
https://phabricator.wikimedia.org/T177550

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Lydia_Pintscher, daniel, ArielGlenn, Aklapper, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to