Hello, I have been looking into how the compression for plucker could be improved. And here are some numbers...
I have a plucker document with about 1600 html pages, they range from 1k - 37k. The vast majority are between 2k - 10k. Various compression techniques: 1. Total Raw Bytes 8Mb. 2. Total gzipped 3.1Mb (Each file individually compressed). 3. Total tar gzipped 2.3Mb If I understand correctly plucker gzips (zlib) each file individually and then puts them all into one big pdb file. (Note that when I pluck these html files the pdb file is 3.1Mb the same size as option 2.) This is good because it means that you don't have to decompress the entire 8 megs of data in order to retrieve the file you want. Bad because gzip doesn't compress 1k files nearly as well as 8meg files. Now I ran a little experiment where I took chunks of those small files and tarred them into bigger files (each of which was still smaller than 32k) and then gzipped these slightly larger files. This resulted in a total compressed size of 2.6Mb. A fairly good reduction from 3.1Mb. I used a very simple algorithm to determine which files to add together, possibly a better bin packing algorithm would see even better improvement. Anyway could this be used with plucker. Could the file format be modified to put a number of small files into one 32k file, and then have plucker handle the unzipping and extracting of the proper file? If someone could tell me what they would like the file format to be like I think I could handle the python side of things. But I am not sure about the palm side of things. Nathan Bullock ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
