Customize Tika Parser - How to access nutch Content object or is it possible to stack Parsers

Torsten Krah Thu, 22 Jul 2010 07:53:01 -0700

Hi , using standard nutch parsers, i am able to get access to the 

org.apache.nutch.protocol.Content


to get some data to index from the original URI if they are not already found 
@Metadata object.
Using Nutch 1.1 i want to use the tika parsers and wonder if this can be done 
- the API does not look like to make it happen.
So maybe i miss the glue where i can do such things - maybe via my own tika 
parser (where to register them with nutch?).
Or is it possible to stack parsers - e.g. let tika do its "standard" work and 
after that let the next Nutch Parser run to do this stuff?

Any hints appreciated.

thx

Torsten

-- 
Bitte senden Sie mir keine Word- oder PowerPoint-Anhänge.
Siehe http://www.gnu.org/philosophy/no-word-attachments.de.html

Really, I'm not out to destroy Microsoft. That will just be a 
completely unintentional side effect."
        -- Linus Torvalds

smime.p7s
Description: S/MIME cryptographic signature

Customize Tika Parser - How to access nutch Content object or is it possible to stack Parsers

Reply via email to