[ https://issues.apache.org/jira/browse/TIKA-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting updated TIKA-113: ------------------------------- Affects Version/s: (was: 0.2-incubating) Fix Version/s: 0.2-incubating Issue Type: Improvement (was: Wish) I think the SAX event stream should still contain selected metadata in the <head/> section. For example the current XHTMLContentHandler outputs the TITLE metadata field (if available) as the <title/> of the generated XML document. Instead of changing that pattern, we should probably either change WriteOutContentHandler to only output content of the <body/> element or add a new ContentHandler utility class with that feature. > Metadata (such as title) should not be part of content > ------------------------------------------------------ > > Key: TIKA-113 > URL: https://issues.apache.org/jira/browse/TIKA-113 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Rida Benjelloun > Fix For: 0.2-incubating > > > Metadata (such as title) is added in the content. In my opinion it would be > preferable that the toString () on the writer return only the content of the > document and not metadata. The metadata are already stored in the metadata > object > Rida. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.