Re: Tika and invisible text from pdf

Dave Meikle Sat, 09 Mar 2013 15:53:50 -0800

Hi Brad,

On 21 Feb 2013, at 11:28, Brad Stallion <[email protected]> wrote:


> I'm extracting text from PDF files using my own sax handler. The problem is 
> that I get both visible and invisible text, i.e. text contained in invisible 
> parts of the layout.
> How can I identify the invisible parts?

We use PDFBox under the hood in Tika.  Have you tried asking on their user list?

Cheers,
Dave

Re: Tika and invisible text from pdf

Reply via email to