hoo created this task. hoo added projects: Datasets-General-or-Unknown, Wikidata. Herald added a subscriber: Aklapper. |
TASK DESCRIPTION
We should only dump up to N entities in each maintenance script run, and then start a new dumper instance at that offset.
This has several benefits:
- If a script fails, we just need to redo the last N entity batch and not the whole thing.
- We can (with some grace time) nicely react in case a DB etc. goes down/ changes (and even if no grace time is given, 1. helps here)
- All shards will be equally fast (because they will switch DB replicas/ external storage replicas throughout, so picking a slower one at some point doesn't have as much effect)
- Memory leaks and other long-running PHP with MediaWiki things don't bite us as hard
- …
I suggest to pick N so that a dumper runs for about 15-30m, before exiting and handing over to the next runner.
TASK DETAIL
EMAIL PREFERENCES
To: hoo
Cc: Lydia_Pintscher, daniel, ArielGlenn, Aklapper, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
Cc: Lydia_Pintscher, daniel, ArielGlenn, Aklapper, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs