https://bugzilla.wikimedia.org/show_bug.cgi?id=68793

            Bug ID: 68793
           Summary: Wikidata JSON dump: better compression than gzip
           Product: Datasets
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: ar...@wikimedia.org
          Reporter: federicol...@tiscali.it
                CC: gsv...@gmail.com, wikidata-b...@lists.wikimedia.org
        Depends on: 54369
       Web browser: ---
   Mobile Platform: ---

I converted 20140721.json.gz to 20140721.json.xz and 20140721.json.bz2; gz is
2.9 GB, the other two were 2.0 GB. Saved space seems worth the effort.

For uncompression, which is what matters, xz uncompressed in 4 min vs. 2 min of
gz. All the formats are supported natively by tar -af etc.; in recent versions,
xz is parallel. I'm quoting from memory, because I killed the screen by
mistake, but it seems LZMA/xz may be best choice.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to