I'm trying to extract text from documents like http://tinyurl.com/nljnnyk, 
having started with code from the ExtractText tool.  Unfortunately I find that 
only a small portion of the text is extracted, which seems to be related to 
which fonts are used.  I've seen the FAQ related to failed text extraction, 
however, I've also found that other tools such as PDFMiner or Xpdf are able to 
harvest all of the text.

For my application I would much prefer a Java-based solution over those other 
tools, so I'm wondering if there is any way to solve this.  I'm hoping for 
either a configuration change or small mod to the sample code.

Thanks for any help,
trey

Reply via email to