Re: [PATCH] maint: ship .xz, not .lzma

2010-09-14 Thread Charles Wilson
On 9/14/2010 2:04 AM, Gary V. Vaughan wrote:
 I'm curious to know what the history of lzma and xz is that makes this
 desirable though.

Here's some documentation I put together for the cygwin xz package:

xz

This package provides a data compression library and utilities
supporting the .xz and .lzma file formats, which use the LZMA
compression algorithm.  LZMA provides high compression ratios and very
fast decompression, with minimal memory requirements for decompression.
XZ Utils is the latest generation of this software, supplanting the
older LZMA Utils.

The cygwin xz package replaces and obsoletes the cygwin lzma package.

LZMA Utils (and its own antecedent, the LZMA SDK) provided the 'lzma'
tool, which supported the 'LZMA-Alone' file format usually indicated by
the extension '.lzma'.  Internally, this file format used what is now
called the LZMA1 compression algorithm.

XZ Utils provides the xz tool, which supports the new .xz file format
usually indicated by the extension '.xz'. Internally, it uses a
variation of the original LZMA compression algorithm, called LZMA2.
However, the new xz tool also seamlessly supports the older .lzma files
and LZMA1 compression.

History:


1. LZMA SDK
First there was the LZMA SDK. Upstream, it shipped no libraries; only a
few executables such as 'lzma'. The source code was provided for public
use (under a variety of licenses), but it was expected that developers
would incorporate the source code directly into their own projects.
This is not The Unix Way.

The LZMA SDK was tightly coupled with the 7zip compression program, and
both were developed on and solely for the Windows platform.  7zip -- but
not the LZMA SDK -- was ported to Unix under the auspices of the p7zip
(Portable 7zip) project. (As an aside, p7zip was then ported to
cygwin...to come full circle). However, it should be clear that the file
format used by 7zip (and p7zip) was completely different from the one
supported by the LZMA SDK's 'lzma' tool.  The latter used what was
called the 'LZMA-Alone' format, which consisted of 13 bytes of header
information followed by a raw lzma-compressed byte-stream.  7zip, on the
other hand, used a much more complicated file format capable of hosting
multiple files, spanned archives, and other features. The only
similarity is that the core data compression algorithm used by both is
LZMA.

2. LZMA Utils
Eventually, a unix port of the LZMA SDK appeared, in the form of the
LZMA Utils distribution, which reorganized the original source code, and
provided the decompression code in library form (liblzmadec). It also
provided a version of the 'lzma' program, but with a completely
different command-line interface. The LZMA Utils version consciously
mimicked the command-line options of the familiar gzip and bzip2 tools,
while the original LZMA SDK version was...different. Very different.
This is because the LZMA SDK's tool was originally intended just as a
test and development utility, to help refine the algorithm. So, it has
a number of 'compression guru' options that no sane user cares to use,
and very few of the 'normal user' options that they would.

   LZMA Utils: (Lasse Collin)
  lzma -d foo.tar.lzma
 uncompress to (implied) foo.tar, and remove
 original compressed file.
  lzma foo.tar
 compress to (implied) foo.tar.lzma, and remove
 original uncompressed file.
  Supports familiar tuning options like -0 .. -9
  Sends output data to stdout using -c
  Could be invoked under alternate names (symlinks)
  for different behavior:
  unlzma == lzma -d  (uncompress)
  lzcat  == lzma -dc (uncompress to stdout)

   LZMA SDK: (Igor Pavlov)
  lzma d foo.tar.lzma foo.tar
  lzma e foo.tar  foo.tar.lzma
 mode d/e is the required first non-option argument
 both input and output files must be specified
  stdout? what's that?

Finally, LZMA Utils also shipped a number of helpful scripts similar to
the familiar ones from gzip and bzip2:
  lzdiff/lzcmp, lzgrep/lzegrep/lzfgrep, lzless/lzmore

So, the LZMA SDK version was hardly suitable for replacing or augmenting
the existing bzip2 and gzip compression programs on unix systems,
expecially as the most common use was in conjuction with tar.  But tar
expects compression programs to satisfy a common command-line argument
format, and to be able to manipulate data on standard streams. Most
linux distributions have standardized on LZMA Utils.

The lzma tool from both LZMA SDK and LZMA Utils each support the
LZMA-Alone (.lzma) file format, as does the liblzmadec library from
LZMA Utils.

However, the .lzma file format (e.g. LZMA-Alone) is not sufficient for
modern needs, as it (1) had no 'signature bytes' so compressed files
were difficult to automatically detect and verify, (2) it had no
provision for 

Re: [PATCH] maint: ship .xz, not .lzma

2010-09-14 Thread Eric Blake

On 09/14/2010 07:58 AM, Eric Blake wrote:

* configure.ac (AM_INIT_AUTOMAKE): Prefer better file format.
* Makefile.maint (git-release, git-dist, prev-tarball)
(new-tarball, diffs): Use correct extension.
* HACKING: Update instructions.


Hmm - I mentioned it in ChangeLog, but hadn't yet saved the buffer when 
I did 'git commit'.  I squashed this in before actually pushing (thank 
heavens for 'git push --dry-run' and double checking what I was about to 
do).


diff --git c/HACKING w/HACKING
index e9184f2..d36b7f0 100644
--- c/HACKING
+++ w/HACKING
@@ -602,7 +602,7 @@ or obtained by writing to the Free Software 
Foundation, Inc.,

   (esp. bug-libtool) for outstanding bug reports also in the list of
   pending moderation requests.

-* Make sure you have wget, lzma, and autobuild installed.  aclocal 
should be

+* Make sure you have wget, xz, and autobuild installed.  aclocal should be
   able to find autobuild.m4; or you can install it into the tree with
  aclocal -I libltdl/m4 --install



--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [PATCH] maint: ship .xz, not .lzma

2010-09-14 Thread Bob Friesenhahn

On Tue, 14 Sep 2010, Gary V. Vaughan wrote:


No objections.

I'm curious to know what the history of lzma and xz is that makes this
desirable though.


I am curious to know if XZ Utils has now achieved a proper stable 
release or if it will be perpetually in a prototype like state.  Its 
code is quite large and quite obtuse.


Also, I remain curious to know why 'lzip' has never been considered as 
a suitable replacement.  Lzip accomplishes the same thing with 10 
times less code, and better fits the traditions previously established 
by gzip and bzip2.  Its only limitation is that it requires a C++ 
compiler.  The claim is made that it is not portable because it does 
not come with a megabyte-sized configure script, but it does not need 
such a huge configure script because it only uses portable ANSI 
interfaces, similar to the way gzip only requires ANSI C.  This sort 
of decision-making results in people feeling that GNU software is 
excessively complex bloatware.  Personal politics and status has 
become more important than proper technical analysis.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/



Re: [PATCH] maint: ship .xz, not .lzma

2010-09-14 Thread Charles Wilson
On 9/14/2010 11:02 AM, Bob Friesenhahn wrote:
 On Tue, 14 Sep 2010, Gary V. Vaughan wrote:

 No objections.

 I'm curious to know what the history of lzma and xz is that makes this
 desirable though.
 
 I am curious to know if XZ Utils has now achieved a proper stable
 release or if it will be perpetually in a prototype like state.

Well, the 4.999.9beta is supposedly the final beta.  However, it was
released 2009-08-27 (e.g. a year ago) -- so, in order to keep that
promise (!) the webpage now says:

 A snapshot from the git repository is available too, and is generally
 recommended over 4.999.9beta.

 xz-4.999.9beta-180-ge23e.tar.gz (1114 KiB)

How that differs from a new RC/beta I don't know, but there you go.
Anyway, if you check the git logs, you'll see that most of the recent
changes have been stabilization and documentation, so I think it is
asymptotically converging on an actual release. Of course you know the
problem with asymptotes...

 Its
 code is quite large and quite obtuse.

Meh.  Most of that is for the alternate compression schemes (e.g. there
are schemes tuned specifically for compressing mips binary code, and x86
binary code, etc).  The core LZMA compression and XZ file format
handling is maybe only 1.5x-2x bzip2.

Take a look at the xz-embedded repo; it includes only the XZ and core
LZMA stuff:
git clone http://git.tukaani.org/xz-embedded.git

 Also, I remain curious to know why 'lzip' has never been considered as a
 suitable replacement.  Lzip accomplishes the same thing with 10 times
 less code, and better fits the traditions previously established by gzip
 and bzip2. 

I'm not sure that last clause (...better fits...) is true. Surely, the
LZMA SDK code and utilities were...different.  But the LZMA Utils and
its successor XZ Utils were *specifically* written to follow the
gzip/bzip2 traditions.

When I added xz support to cygwin's setup.exe via liblzma, the glue code
to do so was VERY similar to the corresponding .gz and .bz2 glue code...
Ditto when similar glue was added to BSD's libarchive.

 Its only limitation is that it requires a C++ compiler.  The
 claim is made that it is not portable because it does not come with a
 megabyte-sized configure script, but it does not need such a huge
 configure script because it only uses portable ANSI interfaces, similar
 to the way gzip only requires ANSI C.  This sort of decision-making
 results in people feeling that GNU software is excessively complex
 bloatware.  Personal politics and status has become more important than
 proper technical analysis.

Err...I don't think I want to get into a religious war. (I will say
this, tho: requiring a 1MB C++ runtime library like libstdc++.so at
*runtime* is not _my_ usual approach when trying to create non-bloated
software, and hardly makes up for the savings of not having a 1MB
configure script at *build* time.  Sure, on real unix you'll already
have that runtime lib installed, but lzma/xz was pitched on unix as
usable on embedded systems and in-kernel too...the same can't be said
for lzip)

The fact is, whether we @ libtool like it or not, .lzma compression had
been adopted by most other GNU projects as the next great compression
scheme (whether it really WAS or not, is debatable as all such
assertions are).  When the two primary forces behind lzma-on-unix (Igor
Pavlov and Lasse Collin) got together to formulate the xz extension, the
early .lzma adopters -- e.g. many GNU projects -- followed along.

As one of those GNU projects, automake added support for dist-lzma --
and later dist-xz, not dist-lzip.

That's where we are.

If you want to start an xz-vs-lzip fight, propose the appropriate
support for dist-lzip on automake-patches and fight it there. :-)

--
Chuck