Torsten, I am trying to do the same thing - manipulating the content of a document parsed with Tika using HTMLParseFilter. I have trouble identifying the proper API call in the filter implementaion class, would you be willing to share your code since you said you had that part working? Thx Dietrich
Torsten Krah wrote: > > Am Freitag, 23. Juli 2010, um 11:12:28 schrieb Julien Nioche: > > For HTML this is ok and works already. > But for non HTML content (PDF, DOC etc.) i did not found any filter API > like > the HTML one (e.g. BinaryParseFilter or something else)? > How to do this there (filter like approach)? > > thx > > Torsten > > -- View this message in context: http://lucene.472066.n3.nabble.com/Customize-Tika-Parser-How-to-access-nutch-Content-object-or-is-it-possible-to-stack-Parsers-tp987281p3191544.html Sent from the Nutch - User mailing list archive at Nabble.com.

