Thank you! And, right, I see that Text is in the class name :D

On Tue, Apr 2, 2019 at 1:26 PM Andreas Lehmkuehler <[email protected]> wrote:

> Am 02.04.19 um 13:32 schrieb Tim Allison:
> > All,
> >    I just noticed this in PDFTextStripper's processPages():
> >
> > if (page.hasContents())
> > {
> >      processPage(page);
> > }
> >
> > If a page has an embedded file, inline images, annotations etc, but no
> > text content, does this mean we're skipping the page by accident?  In
> > short, do we need to override processPages in Tika to process every
> > page?
> >
> > Or, does "hasContents()" include anything... whether or not it is
> text-based?
> The checks in "hasContents()" are limited to the contentstream(s). It
> doesn't
> mean empty page.
>
> And yes, you have to override  processPages if you want to include some
> additional stuff
>
> >
> > Thank you.
> >
> >           Best,
> >
> >                   Tim
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to