Re: Compression

Johnathan Taylor Sun, 6 Nov 1994 09:40:04 +0100

In a message to All <03 Nov 94 13:02> [EMAIL PROTECTED] wrote:

 Ia> On 03 Nov 94 04:35:00 +0000, Johnathan Taylor said:
..
 >> A gzip'd tar must first be decompressed in its entirety.


 Ia> Your answer is quite correct, but it
 Ia> is to a different question...  It
 Ia> is true that if a gzip file gets
 Ia> corrupted then that's it.  However I

I didn't understand the question as it didn't apear relevent to what I 
said/meant in the text you quoted prior to it, so I guessed at what you wanted 
and got it wrong, obviously;-)

 Ia> think specially catering for corrupted
 Ia> files is not the main aim of a
 Ia> compression/archive system.

It's not, it's just a valuable advantage:-)

 Ia> Anyway, the explanation that I
 Ia> actually wanted was of the following
 Ia> sentence:

 >jet> Plus I'd want
 >jet> to be able combine assosiated files
 >jet> into a single archive not a seperate
 >jet> lose-able bunch of seperatly compressed files!

 Ia> So why is a single compressed file any
 Ia> more lose-able than any other kind of
 Ia> file?

What I meant by a bunch of lose-able sperate files is when you have 1000 
readme files for a 1000 different programs or whats.new etc. Keeping all files 
related to a particular program in one archive with random access to 
individual members allows easy access to locate a single programs assosiated 
files without having to resort to sticking each bunch of files in their own 
directory or resorting to obtuse 32bit HEX digit names to uniquly ID 
individual compressed files requiring some form of data-base to re-create 
human readable named access to individual files...
Does that explain what I meant? If not don't worry about it!

 Ia> How is this different from LZ77?

AFAIK LZ77 is a distinct 2-stage process and requires the addition of the 
encoding tree prior to starting of the actual encoded bit-stream, Whilst LZHUF 
does it on the fly so to speak, this also means that LZHUFF can actually even 
compress small files of less than 100bytes but LZ77 cannot!

Plus dictionary size for dictionary size LZHUF generally produces better 
compression than LZ77 it's only problem is that the encode alogerithm is a bit 
more processor intensive which is probably why LZHUF -lh5- only uses an 8k 
dictionary whilst LZ77 has HAD to grow to 32k(ZIP) or 64k(RAR) to beat it on 
99% of file types.
 >> These ARE ALL Documented in various places on
 >> the nets, so those with direct access
 >> to all those net tools should be able
 >> to locate them themselves!

 Ia> There are quite a lot of things on the
 Ia> net actually, so you will have to
 Ia> narrow down the search a little more than that.

Heck I duno what the facility of the online tools are! Isn't it called archie 
or somthing... ask it to look for LHA or LZHUF that's one reason I capitalise 
those words.....

 >>  >> The thing about gzip is that it uses
 >>  >> very little RAM to decompress (apart
 >>  >> from having to store each block of the
 >>  >> file

 >> I don't know WHO wrote the above
 >> paragraph imc maybe... but GET REAL! gzip and
 >> PKUNZIP2.04g etc are required to keep a
 >> running 32K ring-buffer OR rely on
 >> flawless random file access of the
 >> output stream in order to get at the 32k
 >> sliding dictionary!

 Ia> Which is implied by "storing each
 Ia> block of the file" (if the blocks are 32K.

Didn't you know that it uses a 32k sliding window?

 Ia> Note: I did _not_ mean disk blocks.

So? does this mean that it uses any less ram just because you chose not to 
name that 'block' size? 32k is half the available address space! then you need 
to add the program code size, main tables size input and output file handling 
interface into Sam-Dos.... I know decompression *CAN* be done with a z80 but 
why bother when the Sam cannot compress the archives and probably never will 
be able to do DeflatX compression! The only use I'd see for such a util is to 
inflate tar.z file from a unix box... If that was the requirement then I'd do 
what I do now and inflate them on the unix box and LHA re-compress them and 
use PMEXT.COM under prodos or port LHA to sam-dos which should be much easier 
than gzip!

 Ia> Do you remember how this started?  By
 Ia> me saying I might write one.

I do remeber but have you started yet? Well assuming you were serious may I 
suggest that you don't *AIM* for gzip as its only a partial product! Which is 
one reason why I am so against the waste of effort in writing it for the SAM. 

The full product is a ZIP and UNZIP! Actually IF you did that then yould only 
need to make the UNZIP portion handle deflated files as the ZIP portion could 
still achieve reasonable ratios using the older implode alogerithm which is 
the same as deflate but uses a 4 or 8k sliding dictionary and IS supported by 
the latest PKUNZIP and the latest portable UNZIP50p? infozip project.. And is 
more likely to fit reasonably into the sam scheme of things:-)
And I'm sure Si'll be very glad to hear of a ZIP/UNZIP for sam native mode if 
you can write it:-)

 >> As if deflate was the BEST lossless
 >> compression method... Ever heard of RAR?

 Ia> No I haven't.

Well there's loads of stuff you seem to be missing info on.. If you've got 
live net access I suggest you go digging through some of those internet sites 
and have a read about what the real alternatives are for yourself, after all 
if you don't believe me go get the tech docs and read for your self:)

 >> Oh btw LZHuff can compress some LZW
 >> encoded stuff a bit further!

 Ia> Perhaps, but you are better off
 Ia> uncompressing the LZW first before trying
 Ia> another compression method.

Of course, but it proves that LZHuff is more robust when compressing awkward 
data than LZW. LZ77 gets around the stuation of uncompressable stuff by 
resorting to temporarily giving up and simply storing the awkward 32k section, 
LZHUFF generally speaking doesn't come across this sort of situation often 
enough to warrent such a drastic change in format as was done when PKZIP204 
was released.

A reason I feel LHARC format is better for the SAM than the ZIP format is that 
although both forms can hold all the sam-specific file attribute info, the 
LHARC format can also be implemented on the humble 48k speccy with disks and 
could contain speccy file info too! It just seems more logical to use a format 
that's free from license restrictions and easy to implement as a non-rambound 
utility file serialization utility that works without needing vast amount of 
ram to function in.

Oh btw even the old -lh1- LZHUF method is amazingly more effective than the 
old 16-bit unix.Z compress method at compression and resource requirements.
When I receive tar.Z archives on my SAM I always zcat 'em and re-compress them 
straight away with PMARC (the z80 LZHUFF derivative). That reduces the overal 
size of the compressed tar by at least another 3rd smaller than the 
unix-compressed tar size. Until I can sort out a clean floppy to detar it and 
archive the tar members properly so they're useable but still much smaller 
than the original tar.Z archive!

Would you imc mind if we let this thread taper off please as I've got other 
things I need to be doing rather than aurguing the toss about somthing as 
trivial as this to save you wasted effort when I want to get on with other 
more pressing stuff and you've got to re-aquaint yourself with your recently 
obtained SAM;-)
Cheers
Johnathan.

___ Olms 1.60 [Evaluation]
+-------------------------------------------------------------------+
| Standard disclaimer:  The views of this user are strictly his own |
|  ===> Gated @ Centronics BBS [centron.com] +44-1473-273246] <===  |
+-------------------------------------------------------------------+

Re: Compression

Reply via email to