Hi Brad, On 21 Feb 2013, at 11:28, Brad Stallion <[email protected]> wrote:
> I'm extracting text from PDF files using my own sax handler. The problem is > that I get both visible and invisible text, i.e. text contained in invisible > parts of the layout. > How can I identify the invisible parts? We use PDFBox under the hood in Tika. Have you tried asking on their user list? Cheers, Dave
