On 4/12/12, Albert Astals Cid <[email protected]> wrote: > El Dissabte, 31 de març de 2012, a les 01:23:17, Ihar `Philips` Filipau va > escriure: > > Commited.
Thanks. >> - invert-mask-001.diff >> Implement inversion of the mask, if that is required by the decode >> array or background/foreground colors appear to be swapped. The >> heuristic is just 4 lines, probably unreliable but "works for me" - >> and thus I will not object for the 4 lines to be removed. > > Don't think it makes sense to do this, a mask is a mask, not an image, and > like a mask shall be extracted imho. Or just don't extract it, but try to > guess stuff will result in problems. I had this in mind when I posted the patches. I disagree with the "try to guess stuff" comment, but that's OK. (Shortly: all of the pdftotext/pdftohtml/friends are all about guessing stuff - guessing words, guessing lines, guessing colors, guessing fonts. Heck, TextOutputDev is part of poppler (not poppler-utils!) and it does guessing about hyphenation - much much worse offense as it modifies text being extracted from PDF.) Of all the PDFs I have went through (more than 50 now), not a single one had a mask used as a mask - all mask images were used exclusively to represent a monochrome image: gothic chapter delimiter or a diagram or a logo. But nevermind, at least now we extract the images, and they can be postprocessed manually later: ImageMagick's `convert -negate` does the job. It's not the worst I have seen from the PDF. best regards, happy weekend and, of course, have a nice release! ;) _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
