I have a pdf:
http://dl.dropbox.com/u/28209500/Pages%20from%2011-12%20CA%20Apps.pdf

which appears to use an internal font encoding, so I am unable to extract the 
text with PDFBox. (Actually, I have lots and lots of forms with this same 
problem).

I thought as an alternative, I could use pdfbox to convert the form to an 
image, then use OCR to process the pdf. However, when I convert to an image 
with pdfbox, the text comes out as gibberish as well.

So, am I correct in assuming that for pdfs which have internal fonts, there is 
no way to get at the actual text of the font using pdfbox, even as an image? If 
I know that it's impossible I can start looking at alternatives to pdfbox...

Patrick Nichols

Reply via email to