The attached file was created in Google Docs with an image inside and saved
as an .odt file. After saving, I opened the file with LibreOffice and added
a hyperlink to the image.

When I parse the file with Tika, neither LinkContentHandler or
ToXMLContentHandler show any trace of the hyperlink.

The link is clickable when I open the document, and I can see it inside
content.xml after extracting the document with 7zip:
*<draw:a xlink:type="simple" xlink:href="http://example.test/
<http://example.test/>">*

I tried enabling all options in OfficeParserConfig and OOXMLParser but it
hasn't made a difference so far. The X-Parsed-By header shows it is being
parsed with org.apache.tika.parser.odf.OpenDocumentParser.

Could this be a bug with the org.apache.tika.parser.odf.OpenDocumentParser?

-- 


This email, its contents and attachments contain information from J2 
Global, Inc. and/or its affiliates which may be privileged, confidential or 
otherwise protected from disclosure. The information is intended to be for 
the addressee(s) only. If you are not an addressee, any disclosure, copy, 
distribution or use of the contents of this message is prohibited. If you 
have received this email in error, please notify the sender by reply email 
and delete the original message and any copies.

Attachment: link-gdocs.odt
Description: application/vnd.oasis.opendocument.text

Reply via email to