Mitar added a comment.
OK, so it seems the problem is in pbzip2. It is not able to decompress in
parallel unless compression was made with pbzip2, too. But lbzip2 can
decompress all of them in parallel.
See:
$ time bunzip2 -c -k latest-lexemes.json.bz2 > /dev/null
real 1m0.101s
user 0m59.912s
sys 0m0.180s
$ time pbzip2 -d -k -c latest-lexemes.json.bz2 > /dev/null
real 0m57.662s
user 0m57.792s
sys 0m0.180s
$ time lbunzip2 -c -k latest-lexemes.json.bz2 > /dev/null
real 0m13.346s
user 1m35.951s
sys 0m2.342s
$ lbunzip2 -c -k latest-lexemes.json.bz2 > serial.json
$ pbzip2 -z < serial.json > parallel.json.bz2
$ time lbunzip2 -c -k parallel.json.bz2 > /dev/null
real 0m16.270s
user 1m43.004s
sys 0m2.262s
$ time pbzip2 -d -c -k parallel.json.bz2 > /dev/null
real 0m17.324s
user 1m52.946s
sys 0m0.659s
Size is very similar:
$ ll parallel.json.bz2 latest-lexemes.json.bz2
-rw-rw-r-- 1 mitar mitar 168657719 Jun 15 20:36 latest-lexemes.json.bz2
-rw-rw-r-- 1 mitar mitar 168840138 Jun 20 07:35 parallel.json.bz2
TASK DETAIL
https://phabricator.wikimedia.org/T222985
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Mitar
Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs,
Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86,
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper,
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]