[ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837886#action_12837886 ]
Uwe Schindler commented on TIKA-365: ------------------------------------ The problem is currently that not all file extensions of OpenDocument are in the mime.types, the file is simply only detected as ZIP file. We need some generic OpenDocument matcher pattern that is more specific than the ZIP file one, like for MSOffice formats (old and -x formats). One idea is to look for contents.xml or metadata.xml in the pattern. > Extract more OpenDocument metadata > ---------------------------------- > > Key: TIKA-365 > URL: https://issues.apache.org/jira/browse/TIKA-365 > Project: Tika > Issue Type: Improvement > Components: metadata > Affects Versions: 0.6 > Reporter: Nick Burch > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.7 > > Attachments: oo-metadata.patch, testOpenOffice2.odf > > > The attached patch adds support for a few more kinds of OpenDocument > metadata. These are added to the metadata object much like the existing ones. > There's also support for user defined metadata support. (Custom Metadata is > stored in lines like <meta:user-defined meta:name="Info 1">Text > 1</meta:user-defined>). There's a new MetadataHandler, > AttributeDependantMetadataHandler, which can use the value of an attribute on > the node to decide what to call the metadata when done with the node. > Also included are several more tests for the OpenDocument parser, and one > more test file to go with this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.