Michael Wechner schrieb:

I would suggest using BodyContentHandler instead of
WriteOutContentHandler. You can use it just like
WriteOutContentHandler, but it only outputs the contents of the
<body/> section. See the --text option in TikaCLI or the ParsingReader
class for good examples.

yes, I have seen the BodyContentHandler, but it means I have to explicitely concatenate the title (and the other meta data), which is not that much
effort,

I am using now the BodyContentHandler and aggregate the rest of the metadata (title, keywords, description, etc.) and it works well, but as pointing out below I think the WriteOutContentHandler is misleading and I think the behaviour should either be changed or deprecated (with a note that one should use the BodyContentHandler)

Cheers

Michael
but as said I think it defeats the purpose of the WriteOutContentHandler ;-)

Thanks for your explanations

Michael
BR,

Jukka Zitting


Reply via email to