ArielGlenn added a comment.
I've run some tests using the (nfs-mounted) filesystem to which our dumps are
written in production.
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20190513$
time (zcat wikidata-20190513-all.json.gz | gzip >
/mnt/dumpsdata/temp/ariel/wikidata-20190513-all.json.gz)
real 163m25.709s
user 240m14.524s
sys 8m42.344s
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20190513$
time (zcat wikidata-20190513-all.json.gz | zstd -q >
/mnt/dumpsdata/temp/ariel/wikidata-20190513-all.json.zst)
real 84m17.266s
user 91m34.532s
sys 9m23.196s
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20190513$
time (zcat wikidata-20190513-all.json.gz | lbzip2 -n 1 >
/mnt/dumpsdata/temp/ariel/wikidata-20190513-all.json.bz2)
real 554m59.818s
user 653m24.460s
sys 13m49.056s
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20190513$
time (zcat wikidata-20190513-all.json.gz | lbzip2 -n 2 >
/mnt/dumpsdata/temp/ariel/wikidata-20190513-all.json.bz2)
real 284m41.349s
user 643m26.664s
sys 14m5.700s
Summary:
- wall lock time for bzcat:
- zstd: 1 hour 25 minutes
- lbzip2 1 thread: 9 hours 15 minutes
- lbzip2 2 threads: 4 hours 45 minutes
- bzcat: (TBD soon, will update this when it's finished)
I need to double check memory usage but barring issues with that, this looks
good. What do Wikidata folks think?
TASK DETAIL
https://phabricator.wikimedia.org/T222985
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: ArielGlenn, Liuxinyu970226, bennofs, darthmon_wmde, alaa_wmde, Nandana,
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen,
rosalieper, gnosygnu, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs