I have a pdf: http://dl.dropbox.com/u/28209500/Pages%20from%2011-12%20CA%20Apps.pdf
which appears to use an internal font encoding, so I am unable to extract the text with PDFBox. (Actually, I have lots and lots of forms with this same problem). I thought as an alternative, I could use pdfbox to convert the form to an image, then use OCR to process the pdf. However, when I convert to an image with pdfbox, the text comes out as gibberish as well. So, am I correct in assuming that for pdfs which have internal fonts, there is no way to get at the actual text of the font using pdfbox, even as an image? If I know that it's impossible I can start looking at alternatives to pdfbox... Patrick Nichols

