Hello,

I have been looking into how the compression for
plucker could be improved. And here are some
numbers...

I have a plucker document with about 1600 html pages,
they range from 1k - 37k. The vast majority are
between 2k - 10k.

Various compression techniques:
1. Total Raw Bytes 8Mb.
2. Total gzipped 3.1Mb (Each file individually
compressed).
3. Total tar gzipped 2.3Mb

If I understand correctly plucker gzips (zlib) each
file individually and then puts them all into one big
pdb file. (Note that when I pluck these html files the
pdb file is 3.1Mb the same size as option 2.) This is
good because it means that you don't have to
decompress the entire 8 megs of data in order to
retrieve the file you want. Bad because gzip doesn't
compress 1k files nearly as well as 8meg files.

Now I ran a little experiment where I took chunks of
those small files and tarred them into bigger files
(each of which was still smaller than 32k) and then
gzipped these slightly larger files. This resulted in
a total compressed size of 2.6Mb. A fairly good
reduction from 3.1Mb. I used a very simple algorithm
to determine which files to add together, possibly a
better bin packing algorithm would see even better
improvement.

Anyway could this be used with plucker. Could the file
format be modified to put a number of small files into
one 32k file, and then have plucker handle the
unzipping and extracting of the proper file? If
someone could tell me what they would like the file
format to be like I think I could handle the python
side of things. But I am not sure about the palm side
of things.

Nathan Bullock




______________________________________________________________________ 
Post your free ad now! http://personals.yahoo.ca
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to