Re: [PATCH] maint: ship .xz, not .lzma
On 9/14/2010 2:04 AM, Gary V. Vaughan wrote: I'm curious to know what the history of lzma and xz is that makes this desirable though. Here's some documentation I put together for the cygwin xz package: xz This package provides a data compression library and utilities supporting the .xz and .lzma file formats, which use the LZMA compression algorithm. LZMA provides high compression ratios and very fast decompression, with minimal memory requirements for decompression. XZ Utils is the latest generation of this software, supplanting the older LZMA Utils. The cygwin xz package replaces and obsoletes the cygwin lzma package. LZMA Utils (and its own antecedent, the LZMA SDK) provided the 'lzma' tool, which supported the 'LZMA-Alone' file format usually indicated by the extension '.lzma'. Internally, this file format used what is now called the LZMA1 compression algorithm. XZ Utils provides the xz tool, which supports the new .xz file format usually indicated by the extension '.xz'. Internally, it uses a variation of the original LZMA compression algorithm, called LZMA2. However, the new xz tool also seamlessly supports the older .lzma files and LZMA1 compression. History: 1. LZMA SDK First there was the LZMA SDK. Upstream, it shipped no libraries; only a few executables such as 'lzma'. The source code was provided for public use (under a variety of licenses), but it was expected that developers would incorporate the source code directly into their own projects. This is not The Unix Way. The LZMA SDK was tightly coupled with the 7zip compression program, and both were developed on and solely for the Windows platform. 7zip -- but not the LZMA SDK -- was ported to Unix under the auspices of the p7zip (Portable 7zip) project. (As an aside, p7zip was then ported to cygwin...to come full circle). However, it should be clear that the file format used by 7zip (and p7zip) was completely different from the one supported by the LZMA SDK's 'lzma' tool. The latter used what was called the 'LZMA-Alone' format, which consisted of 13 bytes of header information followed by a raw lzma-compressed byte-stream. 7zip, on the other hand, used a much more complicated file format capable of hosting multiple files, spanned archives, and other features. The only similarity is that the core data compression algorithm used by both is LZMA. 2. LZMA Utils Eventually, a unix port of the LZMA SDK appeared, in the form of the LZMA Utils distribution, which reorganized the original source code, and provided the decompression code in library form (liblzmadec). It also provided a version of the 'lzma' program, but with a completely different command-line interface. The LZMA Utils version consciously mimicked the command-line options of the familiar gzip and bzip2 tools, while the original LZMA SDK version was...different. Very different. This is because the LZMA SDK's tool was originally intended just as a test and development utility, to help refine the algorithm. So, it has a number of 'compression guru' options that no sane user cares to use, and very few of the 'normal user' options that they would. LZMA Utils: (Lasse Collin) lzma -d foo.tar.lzma uncompress to (implied) foo.tar, and remove original compressed file. lzma foo.tar compress to (implied) foo.tar.lzma, and remove original uncompressed file. Supports familiar tuning options like -0 .. -9 Sends output data to stdout using -c Could be invoked under alternate names (symlinks) for different behavior: unlzma == lzma -d (uncompress) lzcat == lzma -dc (uncompress to stdout) LZMA SDK: (Igor Pavlov) lzma d foo.tar.lzma foo.tar lzma e foo.tar foo.tar.lzma mode d/e is the required first non-option argument both input and output files must be specified stdout? what's that? Finally, LZMA Utils also shipped a number of helpful scripts similar to the familiar ones from gzip and bzip2: lzdiff/lzcmp, lzgrep/lzegrep/lzfgrep, lzless/lzmore So, the LZMA SDK version was hardly suitable for replacing or augmenting the existing bzip2 and gzip compression programs on unix systems, expecially as the most common use was in conjuction with tar. But tar expects compression programs to satisfy a common command-line argument format, and to be able to manipulate data on standard streams. Most linux distributions have standardized on LZMA Utils. The lzma tool from both LZMA SDK and LZMA Utils each support the LZMA-Alone (.lzma) file format, as does the liblzmadec library from LZMA Utils. However, the .lzma file format (e.g. LZMA-Alone) is not sufficient for modern needs, as it (1) had no 'signature bytes' so compressed files were difficult to automatically detect and verify, (2) it had no provision for
Re: [PATCH] maint: ship .xz, not .lzma
On 09/14/2010 07:58 AM, Eric Blake wrote: * configure.ac (AM_INIT_AUTOMAKE): Prefer better file format. * Makefile.maint (git-release, git-dist, prev-tarball) (new-tarball, diffs): Use correct extension. * HACKING: Update instructions. Hmm - I mentioned it in ChangeLog, but hadn't yet saved the buffer when I did 'git commit'. I squashed this in before actually pushing (thank heavens for 'git push --dry-run' and double checking what I was about to do). diff --git c/HACKING w/HACKING index e9184f2..d36b7f0 100644 --- c/HACKING +++ w/HACKING @@ -602,7 +602,7 @@ or obtained by writing to the Free Software Foundation, Inc., (esp. bug-libtool) for outstanding bug reports also in the list of pending moderation requests. -* Make sure you have wget, lzma, and autobuild installed. aclocal should be +* Make sure you have wget, xz, and autobuild installed. aclocal should be able to find autobuild.m4; or you can install it into the tree with aclocal -I libltdl/m4 --install -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [PATCH] maint: ship .xz, not .lzma
On Tue, 14 Sep 2010, Gary V. Vaughan wrote: No objections. I'm curious to know what the history of lzma and xz is that makes this desirable though. I am curious to know if XZ Utils has now achieved a proper stable release or if it will be perpetually in a prototype like state. Its code is quite large and quite obtuse. Also, I remain curious to know why 'lzip' has never been considered as a suitable replacement. Lzip accomplishes the same thing with 10 times less code, and better fits the traditions previously established by gzip and bzip2. Its only limitation is that it requires a C++ compiler. The claim is made that it is not portable because it does not come with a megabyte-sized configure script, but it does not need such a huge configure script because it only uses portable ANSI interfaces, similar to the way gzip only requires ANSI C. This sort of decision-making results in people feeling that GNU software is excessively complex bloatware. Personal politics and status has become more important than proper technical analysis. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Re: [PATCH] maint: ship .xz, not .lzma
On 9/14/2010 11:02 AM, Bob Friesenhahn wrote: On Tue, 14 Sep 2010, Gary V. Vaughan wrote: No objections. I'm curious to know what the history of lzma and xz is that makes this desirable though. I am curious to know if XZ Utils has now achieved a proper stable release or if it will be perpetually in a prototype like state. Well, the 4.999.9beta is supposedly the final beta. However, it was released 2009-08-27 (e.g. a year ago) -- so, in order to keep that promise (!) the webpage now says: A snapshot from the git repository is available too, and is generally recommended over 4.999.9beta. xz-4.999.9beta-180-ge23e.tar.gz (1114 KiB) How that differs from a new RC/beta I don't know, but there you go. Anyway, if you check the git logs, you'll see that most of the recent changes have been stabilization and documentation, so I think it is asymptotically converging on an actual release. Of course you know the problem with asymptotes... Its code is quite large and quite obtuse. Meh. Most of that is for the alternate compression schemes (e.g. there are schemes tuned specifically for compressing mips binary code, and x86 binary code, etc). The core LZMA compression and XZ file format handling is maybe only 1.5x-2x bzip2. Take a look at the xz-embedded repo; it includes only the XZ and core LZMA stuff: git clone http://git.tukaani.org/xz-embedded.git Also, I remain curious to know why 'lzip' has never been considered as a suitable replacement. Lzip accomplishes the same thing with 10 times less code, and better fits the traditions previously established by gzip and bzip2. I'm not sure that last clause (...better fits...) is true. Surely, the LZMA SDK code and utilities were...different. But the LZMA Utils and its successor XZ Utils were *specifically* written to follow the gzip/bzip2 traditions. When I added xz support to cygwin's setup.exe via liblzma, the glue code to do so was VERY similar to the corresponding .gz and .bz2 glue code... Ditto when similar glue was added to BSD's libarchive. Its only limitation is that it requires a C++ compiler. The claim is made that it is not portable because it does not come with a megabyte-sized configure script, but it does not need such a huge configure script because it only uses portable ANSI interfaces, similar to the way gzip only requires ANSI C. This sort of decision-making results in people feeling that GNU software is excessively complex bloatware. Personal politics and status has become more important than proper technical analysis. Err...I don't think I want to get into a religious war. (I will say this, tho: requiring a 1MB C++ runtime library like libstdc++.so at *runtime* is not _my_ usual approach when trying to create non-bloated software, and hardly makes up for the savings of not having a 1MB configure script at *build* time. Sure, on real unix you'll already have that runtime lib installed, but lzma/xz was pitched on unix as usable on embedded systems and in-kernel too...the same can't be said for lzip) The fact is, whether we @ libtool like it or not, .lzma compression had been adopted by most other GNU projects as the next great compression scheme (whether it really WAS or not, is debatable as all such assertions are). When the two primary forces behind lzma-on-unix (Igor Pavlov and Lasse Collin) got together to formulate the xz extension, the early .lzma adopters -- e.g. many GNU projects -- followed along. As one of those GNU projects, automake added support for dist-lzma -- and later dist-xz, not dist-lzip. That's where we are. If you want to start an xz-vs-lzip fight, propose the appropriate support for dist-lzip on automake-patches and fight it there. :-) -- Chuck