On 4/20/09, Peter Tribble <peter.tribble at gmail.com> wrote: > On Mon, Apr 20, 2009 at 6:39 PM, John Levon <john.levon at sun.com> wrote: > > On Mon, Apr 20, 2009 at 09:25:51AM -0700, Rich Burridge wrote: > > > >> bzip2 is a free and open source block sorting lossless data > >> compression algorithm with comparatively high compression > efficiency. > >> > >> pbzip2 is a parallel implementation of the bzip2 algorithm using > >> pthreads written in C++ by Jeff Gilchrist that retains file > >> compatibility with the common bzip2(1) application included in > >> Solaris and many other operating systems > > > > Is there a reason we're not delivering this version as the real bzip2, > > then just providing a symlink? What is the advantage of the non-parallel > > implementation exactly to mean it needs a new name? > > > I use pbzip2 and bzip2 a lot. > > They should be kept separate. > > For one thing, I would expect to get regular bzip2 if I called it by that > name, > likewise pbzip2. > > There are key behavioural differences that I've seen: > > - pbzip2 by default will use all the available cpus. You really don't want > to > make it that easy to saturate a machine - it can be very unpleasant on the > other users of the machine.
So what? Openoffice and Firefox spreads its their threads over all available cpus, too. > - pbzip2 requires memory equal to the size of the file to decompress a > file compressed by bzip2, which may be extremely large and may not > work at all This can be circumvent by using mmap(). Mandrake had such a patch which removed the memory limitation. IMO there is no reason why pbzip2 can't replace bzip2. Irek