[ 
https://issues.apache.org/jira/browse/TIKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated TIKA-46:
------------------------------

    Attachment: TIKA-46-part2.patch

I committed the first patch (with improvements, thanks Chris!) in revisions 
582674 and 582678.

Here's (TIKA-46-part2.patch) the second half of the required changes, i.e. 
dropping the Content configuration from the parse() method.

The patch actually removes the Content class entirely and simplifies the 
tika-config.xml file quite a lot by hardcoding the available metadata in the 
actual Parser classes. As discussed on the mailing list, this actually makes 
sense as in many cases the parsers can only support a given set of metadata 
regardless of configuration. Anyway, we probably need to come up with some 
configuration mechanism for parsers that could support extensible metadata 
extraction.

> Use Metadata in Parser
> ----------------------
>
>                 Key: TIKA-46
>                 URL: https://issues.apache.org/jira/browse/TIKA-46
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>         Attachments: TIKA-46-part1.mattmann.100707.patch.txt, 
> TIKA-46-part1.patch, TIKA-46-part2.patch
>
>
> The Parser interface should use the Metadata framework to pass document 
> metadata in and out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to