Jukka - Do you want to revisit the architecture regarding the information we are currently keeping in the Content object (and will be moving momentarily)? Specifically, the text, xml, and regexp values? Wouldn't there be cases where the different parsers would need their own strings identifying a property such as title? Should we support overriding the existing keys with parser implementation-specific keys? So maybe they would be something like this?:
defaultText="title" defaultXML=... defaultRegExp=... org.xyz.FooParser=... ... so perhaps the parser would look up its own class name, and fall back to the default if it doesn't find it? - Keith JIRA [EMAIL PROTECTED] wrote: > > > [ > https://issues.apache.org/jira/browse/TIKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Jukka Zitting updated TIKA-46: > ------------------------------ > > Attachment: TIKA-46-part1.patch > > Attached a patch (TIKA-46-part1.patch) for introducing a Metadata object > to the Parser interface. This is just the first half of the complete > solution, as we still need to find a way to pass the configuration > information currently contained in the Content collection. > >> Use Metadata in Parser >> ---------------------- >> >> Key: TIKA-46 >> URL: https://issues.apache.org/jira/browse/TIKA-46 >> Project: Tika >> Issue Type: Improvement >> Reporter: Jukka Zitting >> Assignee: Jukka Zitting >> Attachments: TIKA-46-part1.patch >> >> >> The Parser interface should use the Metadata framework to pass document >> metadata in and out. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > -- View this message in context: http://www.nabble.com/-jira--Created%3A-%28TIKA-46%29-Use-Metadata-in-Parser-tf4584057.html#a13085997 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
