question on what text is extractable (in comparison to other tools)

Trey Matteson Fri, 23 May 2014 14:00:25 -0700

I'm trying to extract text from documents like http://tinyurl.com/nljnnyk, 
having started with code from the ExtractText tool.  Unfortunately I find that 
only a small portion of the text is extracted, which seems to be related to 
which fonts are used.  I've seen the FAQ related to failed text extraction, 
however, I've also found that other tools such as PDFMiner or Xpdf are able to 
harvest all of the text.


For my application I would much prefer a Java-based solution over those other 
tools, so I'm wondering if there is any way to solve this.  I'm hoping for 
either a configuration change or small mod to the sample code.

Thanks for any help,
trey

question on what text is extractable (in comparison to other tools)

Reply via email to