Mitar created this task. Mitar added projects: Wikidata, Dumps-Generation. Restricted Application added a project: wdwb-tech.
TASK DESCRIPTION My understanding is that dumps are currently in fact already produced by multiple shards and then combined into one file. I wonder why simply multiple files are not kept because that would also make it easier to process dumps in parallel over multiple files. There are already no guarantees on the order of documents in dumps. Currently this is hard because it is hard to split a compressed file into multiple chunks without decompressing the file first (and then potentially recompressing chunks back). So, given that dump size has grown through time, maybe it is time that it is provided in multiple files, each file at some reasonable maximum size? TASK DETAIL https://phabricator.wikimedia.org/T278204 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mitar Cc: Mitar, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs