2012/5/23 Michael Pyne <[email protected]>: > As an example, try: > > $ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/ > $ pixz kdefoo-x.y.z.tar > # resulting in kdefoo-x.y.z.tar.xz > > Because pixz is parallelized it works on whole blocks of data at a time and as > far as I can tell makes no special provision for the last bits of compressed > data being smaller than the block size. > > With a normal tar file the decompressed data you get is: > > 0--------------------------------* (where * is end of data and end of file) > > With a pixz-encoded tar file the decompressed data you get is: > > 0--------------------------------*x$ (* is end of data, $ is end of file) > > When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will > still work fine: tar knows exactly where the data should really end and will > stop decompressing when it needs to. > > When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -" > though, there's no way to tell xz to stop decompressing early. It tries to > write all the decompressed data to the pipe. tar still knows exactly where to > stop, and does so at the '*', not the '$', and closes its input (a pipe!) > early. > > When xz tries to write the 'x$' (garble data) of the decompressed output it > gets sent to a now-broken pipe, which kills xz on SIGPIPE. > > Scripts trying to drive automated extraction of that data using a pipeline > just see that an error occurred, and will therefore abort. This has affected a > couple of distributions that are source-based, but is annoying even for those > manually extracting to have to figure out that their tarball actually > extracted correctly. > > So the problem is only parallelizing compressors that take advantage of the > allowance to write garbled data past the end of a file and still have the > decompressor "figure it out". It seems pretty implausible to me that a > parallelizing compressor would always do this, perhaps this only occurs when > the compressor is run with tar (e.g. tar cJf) instead of as a separate step?
The "garbled data" has nothing to do with parallelization. pixz stands for "parallel and indexed xz". Apart from being parallel, it stores a custom-formatted index at the end of the tarball, apparently to allow random access. I also noticed that pixz produces larger results than standard xz, even when ignoring the extra index data. See: http://article.gmane.org/gmane.comp.kde.releases/5555 Please do not use pixz for KDE tarballs again... -- Nicolás _______________________________________________ release-team mailing list [email protected] https://mail.kde.org/mailman/listinfo/release-team
