plucker file compression

Jewett, Jim J Wed, 03 Mar 2004 09:50:27 -0800

Nathan Bullock:

> I have a plucker document with about 1600 html pages,
> they range from 1k - 37k. The vast majority are
> between 2k - 10k.


> Various compression techniques:
> 1. Total Raw Bytes 8Mb.
> 2. Total gzipped 3.1Mb (Each file individually
> compressed).
> 3. Total tar gzipped 2.3Mb

The reason for the extra reduction is that there is
some redundancy between files.  For instance, they
probably have similar headers and footers.

You could get a similar reduction by using a custom
dictionary.  Then you would only need to parse this
dictionary (once) plus the desired record, instead
of everything-up-to-the-record.

The zlib spec does allow for a custom dictionary, but
(last I checked) this didn't seem to be implemented 
in the standard open source zlib.[1]  It is "application-
specific".  We would also have to decide whether to
use a (or several?) plucker-custom dictionary or a 
per/pdb dictionary with a special magic record number,
or both.

[1] http://www.gzip.org/zlib/ suggests that it is there
by 1.1.3 (which we use), but was improved since then.

-jJ
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

plucker file compression

Reply via email to