Am 02.04.19 um 13:32 schrieb Tim Allison:
All,
I just noticed this in PDFTextStripper's processPages():
if (page.hasContents())
{
processPage(page);
}
If a page has an embedded file, inline images, annotations etc, but no
text content, does this mean we're skipping the page by accident? In
short, do we need to override processPages in Tika to process every
page?
Or, does "hasContents()" include anything... whether or not it is text-based?
The checks in "hasContents()" are limited to the contentstream(s). It doesn't
mean empty page.
And yes, you have to override processPages if you want to include some
additional stuff
Thank you.
Best,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]