Re: Compression

Johnathan Taylor Thu, 3 Nov 1994 08:17:09 +0100

On <01 Nov 94 20:32> [EMAIL PROTECTED] wrote:

 >> Which is why you tar them up first (and
 >> there's no reason why it shouldn't
 >> be done in one go.  But there is a reason why it should be done first
 >> rather than second, which is that doing
 >> it first gives better compression).


A gzip'd tar archive requires complete decompression BEFORE you can even 
examine it's directory to see if what you want is inside it! Real archivers 
allow random-access to individual members ie updating, deletion, extraction 
require minimal work!

 >> Anyway, why is a single compressed file any more lose-able than any other
 >> kind of file?

I presume that's a serious question! So I'll explain...
A gzip'd tar must first be decompressed in its entirety. So a fault anywhere 
in the bitstream will corrupt the output tar which will normally be deleted 
when the crc error is detected by gzip before tar gets a look at it! Even if 
the bad tar is not deleted then ALL data after the coruption will be invalid 
as a product of the decompression technique!

BUT using a normal archiver each compressed member contains its own data in 
a local header which also points to the next members header so if one members 
compressed data is corrupted it doesn't affect the surounding members, in most 
cases even member header corruption *could* be recoverable enough to extract 
the other good members!

At best a bad tar.gz could be recovered upto the corruption at worst it'd be 
lost forever without a good backup copy!

At best with lha(rc), zip, arc etc all but the corrupt members can be 
recovered at worst just the corrupted members will be lost;-)

 >>
 >> > LHarc format is a generic PD archive
 >> > envelope definition that is supported
 >> > from CP/M, BBC-B, IBM-CLONE, QL, and generic unix!
 >>
 >> That still doesn't tell me what kind of
 >> compression it uses.

 Fr> LZH....

Or to expand upon that a bit it's based on LZ repeated string encoding but 
Huffman encodes the length & position as well as unique data using either 
dynamic or static Huffman coding depending on the the particular method. These 
ARE ALL Documented in various places on the nets, so those with direct access 
to all those net tools should be able to locate them themselves!

 >> > NO LHarc method beats the ZIP deflate
 >> > alogarithm on achieved ratios. BUT they
 >> > ALL require much LESS working ram
 >> > space to compress and de-compress
 >>
 >> The thing about gzip is that it uses
 >> very little RAM to decompress (apart
 >> from having to store each block of the
 >> file, which I would have thought
 >> was a necessity for most systems to be
 >> reasonably efficient). It is rather
 >> resource-consuming during compression,
 >> but then I probably don't care that
 >> much because it's the decompression that matters most.

I don't know WHO wrote the above paragraph imc maybe... but GET REAL! gzip and 
PKUNZIP2.04g etc are required to keep a running 32K ring-buffer OR rely on 
flawless random file access of the output stream in order to get at the 32k 
sliding dictionary! On a sam in Native-mode Random-access to a write file is a 
Kludge, so a 32k ring-buffer IS required as well as the rest of the decoding 
tables....

Simple way for those that believe that gzip is perfect for the SAM is WRITE 
IT! Don't sit there making unfounded claims about it! *PROVE US WRONG!*

 Fr> It does? It produces exactly the same
 Fr> table for decompression as for
 Fr> compression, otherwise it wouldn't be
 Fr> able to decompress...or?! Hmm...
 Fr> can't remember much now.

Spot on Frode! It is *possible* to decompress a deflated file on a z80 using a 
few tricks but the compression would be more impractical than the unix system 
I've considered implementing! There's no point in making a system that can't 
create archives, using only a sam, the sam archiver standard!

I have the unix C source to LHA1.0 for unix, it's in a pma archive but if the 
sam-owner has ProDos and the PMArc/ext suit they can extract it with that and 
xfer it to their unix workstation and compile it and get almost uptodate in 
their archivng methods! After all gzip is just a stop-gap until the full unix 
info-zip becomes widely accepted and obsoletes both tar and gzip for archival 
backups!

 >> imc
 >>

As if deflate was the BEST lossless compression method... Ever heard of RAR?
That uses a 64K sliding window and can produce *solid* archives ala gzip'd 
tars that lock the elements together.... RAR beats deflate BUT AFAIK it's not 
available on unix.

Oh btw LZHuff can compress some LZW encoded stuff a bit further!
The use of JPEG doesn't compress the actual GIF file but the decoded graphics 
data within it, so of course it'll outdo the portable GIF compression or a 
LZHuffed GIF LZW file. Don't forget though that JPEG is a lossy method and as 
a result is only any good for non-critical data types eg graphics. NOT program 
data.

Johnathan.


___ Olms 1.60 [Evaluation]

 
--
|Fidonet:  Johnathan Taylor 2:2501/307
|Internet: [EMAIL PROTECTED]
|
| Standard disclaimer: The views of this user are strictly his own.

Re: Compression

Reply via email to