Re: Tika and invisible text from pdf

samir pendharkar Thu, 21 Feb 2013 04:22:08 -0800

In such cases what works best is look at the "Structured Text" view in TIKA
GUI.
You might be able to skip tags that you don't want in the output(assuming
invisible part is in some different tag).



On Thu, Feb 21, 2013 at 4:58 PM, Brad Stallion <[email protected]>wrote:

> Hi all,
>
> I'm extracting text from PDF files using my own sax handler. The problem
> is that I get both visible and invisible text, i.e. text contained in
> invisible parts of the layout.
> How can I identify the invisible parts?
>
> I've asked to stack overflow as well:
>
>
> http://stackoverflow.com/questions/14956556/tika-and-invisible-text-from-pdf
>
> Thanks a lot for your help!
>
> bye
>

Re: Tika and invisible text from pdf

Reply via email to