This makes sense and I think that ContentHandlerDecorator in your code may
actually help me improve my processing elsewhere.
Thank you for the detailed reply, it's appreciated.
Mike
On Fri, Jan 3, 2020 at 9:17 AM Nick Burch wrote:
> On Fri, 3 Jan 2020, Mike Dalrymple wrote:
> > I've just start
On Fri, 3 Jan 2020, Mike Dalrymple wrote:
I've just started using Tika to process PDFs with embedded images. I'm
getting fantastic results but I'm having to post-process the generated
XHTML to correct the value of the src attribute on the img elements.
That is expected. A simple sax handler sh
Hello,
I've just started using Tika to process PDFs with embedded images. I'm
getting fantastic results but I'm having to post-process the generated
XHTML to correct the value of the src attribute on the img elements. The
generated XHTML has elements like:
My EmbeddedDocumentExtractor is savi