https://bugzilla.wikimedia.org/show_bug.cgi?id=68793
Bug ID: 68793
Summary: Wikidata JSON dump: better compression than gzip
Product: Datasets
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected], [email protected]
Depends on: 54369
Web browser: ---
Mobile Platform: ---
I converted 20140721.json.gz to 20140721.json.xz and 20140721.json.bz2; gz is
2.9 GB, the other two were 2.0 GB. Saved space seems worth the effort.
For uncompression, which is what matters, xz uncompressed in 4 min vs. 2 min of
gz. All the formats are supported natively by tar -af etc.; in recent versions,
xz is parallel. I'm quoting from memory, because I killed the screen by
mistake, but it seems LZMA/xz may be best choice.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l