[ 
https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837886#action_12837886
 ] 

Uwe Schindler commented on TIKA-365:
------------------------------------

The problem is currently that not all file extensions of OpenDocument are in 
the mime.types, the file is simply only detected as ZIP file. We need some 
generic OpenDocument matcher pattern that is more specific than the ZIP file 
one, like for MSOffice formats (old and -x formats). One idea is to look for 
contents.xml or metadata.xml in the pattern.

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument 
> metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is 
> stored in lines like <meta:user-defined meta:name="Info 1">Text 
> 1</meta:user-defined>). There's a new MetadataHandler, 
> AttributeDependantMetadataHandler, which can use the value of an attribute on 
> the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one 
> more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to