I'm trying to extract text from documents like http://tinyurl.com/nljnnyk, having started with code from the ExtractText tool. Unfortunately I find that only a small portion of the text is extracted, which seems to be related to which fonts are used. I've seen the FAQ related to failed text extraction, however, I've also found that other tools such as PDFMiner or Xpdf are able to harvest all of the text.
For my application I would much prefer a Java-based solution over those other tools, so I'm wondering if there is any way to solve this. I'm hoping for either a configuration change or small mod to the sample code. Thanks for any help, trey