[
https://issues.apache.org/jira/browse/TIKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting updated TIKA-46:
------------------------------
Attachment: TIKA-46-part2.patch
I committed the first patch (with improvements, thanks Chris!) in revisions
582674 and 582678.
Here's (TIKA-46-part2.patch) the second half of the required changes, i.e.
dropping the Content configuration from the parse() method.
The patch actually removes the Content class entirely and simplifies the
tika-config.xml file quite a lot by hardcoding the available metadata in the
actual Parser classes. As discussed on the mailing list, this actually makes
sense as in many cases the parsers can only support a given set of metadata
regardless of configuration. Anyway, we probably need to come up with some
configuration mechanism for parsers that could support extensible metadata
extraction.
> Use Metadata in Parser
> ----------------------
>
> Key: TIKA-46
> URL: https://issues.apache.org/jira/browse/TIKA-46
> Project: Tika
> Issue Type: Improvement
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Attachments: TIKA-46-part1.mattmann.100707.patch.txt,
> TIKA-46-part1.patch, TIKA-46-part2.patch
>
>
> The Parser interface should use the Metadata framework to pass document
> metadata in and out.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.