All,
I just noticed this in PDFTextStripper's processPages():
if (page.hasContents())
{
processPage(page);
}
If a page has an embedded file, inline images, annotations etc, but no
text content, does this mean we're skipping the page by accident? In
short, do we need to override processPages in Tika to process every
page?
Or, does "hasContents()" include anything... whether or not it is text-based?
Thank you.
Best,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]