In such cases what works best is look at the "Structured Text" view in TIKA GUI. You might be able to skip tags that you don't want in the output(assuming invisible part is in some different tag).
On Thu, Feb 21, 2013 at 4:58 PM, Brad Stallion <[email protected]>wrote: > Hi all, > > I'm extracting text from PDF files using my own sax handler. The problem > is that I get both visible and invisible text, i.e. text contained in > invisible parts of the layout. > How can I identify the invisible parts? > > I've asked to stack overflow as well: > > > http://stackoverflow.com/questions/14956556/tika-and-invisible-text-from-pdf > > Thanks a lot for your help! > > bye >
