Hi,
Christopher Mason schrieb:
I'm investigating libraries for rendering and extracting text from PDF.
Across the half dozen I've looked at, both commercial and open source,
I think pdfbox is the cleanest.
Oh, interesting. :-)
However, I've run across a number of pdfs that pdfbox does not render
properly. One I'm particularly concerned about is:
http://www.cmason.com/tmp/Sowa.pdf
It looks to have encoding or char -> glyph issues in pdfbox, but look
okay in every other reader/library I've tried. I've tried with both
pdfbox-1.1.0 and with the trunk. Here's how it looks in pdfbox trunk
versus Preview:
http://www.cmason.com/tmp/Sowa.png
Any help or suggestions would be most appreciated.
I've a quick look at the pdf. It uses an embedded subset of true type fonts
which is a known problem, see PDFBOX-490 [1] for further details.
BR
Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-490