Re: Setting PDF2XHTML img src

2020-01-03 Thread Mike Dalrymple
This makes sense and I think that ContentHandlerDecorator in your code may actually help me improve my processing elsewhere. Thank you for the detailed reply, it's appreciated. Mike On Fri, Jan 3, 2020 at 9:17 AM Nick Burch wrote: > On Fri, 3 Jan 2020, Mike Dalrymple wrote: > > I've just start

Re: Setting PDF2XHTML img src

2020-01-03 Thread Nick Burch
On Fri, 3 Jan 2020, Mike Dalrymple wrote: I've just started using Tika to process PDFs with embedded images. I'm getting fantastic results but I'm having to post-process the generated XHTML to correct the value of the src attribute on the img elements. That is expected. A simple sax handler sh

Setting PDF2XHTML img src

2020-01-03 Thread Mike Dalrymple
Hello, I've just started using Tika to process PDFs with embedded images. I'm getting fantastic results but I'm having to post-process the generated XHTML to correct the value of the src attribute on the img elements. The generated XHTML has elements like: My EmbeddedDocumentExtractor is savi