https://bugzilla.wikimedia.org/show_bug.cgi?id=68793
Bug ID: 68793 Summary: Wikidata JSON dump: better compression than gzip Product: Datasets Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General/Unknown Assignee: ar...@wikimedia.org Reporter: federicol...@tiscali.it CC: gsv...@gmail.com, wikidata-b...@lists.wikimedia.org Depends on: 54369 Web browser: --- Mobile Platform: --- I converted 20140721.json.gz to 20140721.json.xz and 20140721.json.bz2; gz is 2.9 GB, the other two were 2.0 GB. Saved space seems worth the effort. For uncompression, which is what matters, xz uncompressed in 4 min vs. 2 min of gz. All the formats are supported natively by tar -af etc.; in recent versions, xz is parallel. I'm quoting from memory, because I killed the screen by mistake, but it seems LZMA/xz may be best choice. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l