Torsten,
I am trying to do the same thing - manipulating the content of a document
parsed with Tika using HTMLParseFilter. I have trouble identifying the
proper API call in the filter implementaion class, would you be willing to
share your code since you said you had that part working?
Thx
Dietrich
 

Torsten Krah wrote:
> 
> Am Freitag, 23. Juli 2010, um 11:12:28 schrieb Julien Nioche:
> 
> For HTML this is ok and works already.
> But for non HTML content (PDF, DOC etc.) i did not found any filter API
> like 
> the HTML one (e.g. BinaryParseFilter or something else)?
> How to do this there (filter like approach)?
> 
> thx
> 
> Torsten
> 
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customize-Tika-Parser-How-to-access-nutch-Content-object-or-is-it-possible-to-stack-Parsers-tp987281p3191544.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to