Rendering Glitches

Christopher Mason Tue, 20 Apr 2010 17:14:22 -0700

I'm investigating libraries for rendering and extracting text from PDF.Across the half dozen I've looked at, both commercial and open source,I think pdfbox is the cleanest.

However, I've run across a number of pdfs that pdfbox does not renderproperly. One I'm particularly concerned about is:


http://www.cmason.com/tmp/Sowa.pdf

It looks to have encoding or char -> glyph issues in pdfbox, but lookokay in every other reader/library I've tried. I've tried with bothpdfbox-1.1.0 and with the trunk. Here's how it looks in pdfbox trunkversus Preview:


http://www.cmason.com/tmp/Sowa.png

Any help or suggestions would be most appreciated.

-c

java -cp~/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:pdfbox-1.1.0.jar:fontbox-1.1.0.jarorg.apache.pdfbox.PDFToImage -color rgba -startPage 1 -endPage 1-resolution 100 -imageType png -outputPrefix Sowa ~/Sites/docs/Sowa.pdf

Rendering Glitches

Reply via email to