On 30/01/16 05:36, Pierre-Luc Samuel wrote: > Hum, yeah RunLengthDecode doesn't seem to be the best algorithm for this > kind of image. Well, it's not really a good compression algorithm at > all from what I see! > > An interesting fact I found was that if I pass my 27 mb file to ps2ps > (ghostscript ps2write device), I end up with a 1.7 MB file that is > "/ASCII85Decode filter /LZWDecode filter". I don't know much about > these decoding algorithms, but it would be really nice if that kind of > post-compression happened directly in poppler's pdftops. > > I'd be willing to help if someone helped me figure it out. I see > poppler already has a LZWStream class, would it simply be a matter of > pluging it in somewhere in PSOutputDev.cc, in place or in addition to > RunLengthDecode?
"pdftocairo -ps tux-yellow.pdf" creates a 112KB file. "pdftocairo -ps -level2 tux-yellow.pdf" creates a 345KB file. So you should be able to get significantly better compression out of pdftops by using the /FlateDecode filter for PS level 3 and fallback to /LZWDecode for level 2. > > Pierre-Luc > > On 01/27/2016 01:55 PM, William Bader wrote: >> tux-yellow and tux-white both convert to a 2549x3299 RGB bitmap that >> is RunLength compressed and ASCII85 encoded. >> >> The yellow file is larger than the white file because "255 194 14" >> does not compress as well as "255 255 255". >> >> The original tux image was Flate encoded with /DecodeParms of >> <</Predictor 15/Columns 512>> >> >> I am not a poppler maintainer, but I think that it should be possible >> to add an option to do Flate compression. >> >> If you want to look at the code, open poppler/PSOutputDev.cc and >> search for occurrences of /RunLengthDecode >> >> The "nothing" files are small because they paint the background by >> drawing a box instead of by copying a bitmapped image. >> >> I think that when a PDF has several images on top of each other, >> pdftops needs to convert the entire area to a bitmap even if some of >> the parts were originally drawn with vector commands. The original >> images have a bitmapped tux over a vector background, but pdftops >> can't separate them and has to rasterize the entire page. >> >> Regards, >> >> William >> >> >> To: [email protected] >> From: [email protected] >> Date: Tue, 26 Jan 2016 14:19:17 -0500 >> Subject: [poppler] pdftops creates huge file with simple color >> background (attached examples) >> >> Hi poppler team, >> >> I have an issue with pdftops version 0.39.0 with conversion of some >> specific templates to postscript. I have created very simple use cases >> so that you can understand the issue. >> >> pdftops tux-white.pdf >> pdftops tux-yellow.pdf >> ls -al *.ps >> -rw-r--r-- 1 2816703 Jan 26 11:53 tux-white.ps >> -rw-r--r-- 1 27576263 Jan 26 11:53 tux-yellow.ps >> >> The size of the second PS is 27MB, but only the background color has >> changed. This seems related to the fact that there is an image on the >> template, because if I remove the image, there is no significant size >> difference: >> >> pdftops nothing-white.pdf >> pdftops nothing-yellow.pdf >> ls -al *.ps >> -rw-r--r-- 1 11129 Jan 26 10:34 nothing-white.ps >> -rw-r--r-- 1 11167 Jan 26 10:34 nothing-yellow.ps >> >> Is this a known issue? >> >> Thanks! >> Pierre-Luc >> >> _______________________________________________ poppler mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/poppler > > > > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
