Y. Looks like a bug. Please open an issue on our JIRA: https://issues.apache.org/jira/projects/TIKA/summary
On Wed, Aug 5, 2020 at 8:27 PM Robert Kaulbach <[email protected]> wrote: > The attached file was created in Google Docs with an image inside and > saved as an .odt file. After saving, I opened the file with LibreOffice and > added a hyperlink to the image. > > When I parse the file with Tika, neither LinkContentHandler or > ToXMLContentHandler show any trace of the hyperlink. > > The link is clickable when I open the document, and I can see it inside > content.xml after extracting the document with 7zip: > *<draw:a xlink:type="simple" xlink:href="http://example.test/ > <http://example.test/>">* > > I tried enabling all options in OfficeParserConfig and OOXMLParser but it > hasn't made a difference so far. The X-Parsed-By header shows it is being > parsed with org.apache.tika.parser.odf.OpenDocumentParser. > > Could this be a bug with the org.apache.tika.parser.odf.OpenDocumentParser? > > > ------------------------------ > > This email, its contents and attachments contain information from J2 > Global, Inc. and/or its affiliates which may be privileged, confidential or > otherwise protected from disclosure. The information is intended to be for > the addressee(s) only. If you are not an addressee, any disclosure, copy, > distribution or use of the contents of this message is prohibited. If you > have received this email in error, please notify the sender by reply email > and delete the original message and any copies. >
