Hi,

In Nutch a `synthetic token` maps to a field/value pair.  You need an indexing 
filter to read the key/value pair from the parsed metadata and add it as a 
field/value pair to the NutchDocument. You may also need a custom parser filter 
to extract the data from somewhere and store it to the parsed metadata as 
key/value, which you then further process in your indexing filter.

Check out the index-basic and index-more plugins for examples.

Cheers, 
 
-----Original message-----
> From:Jakub Moskal <[email protected]>
> Sent: Mon 21-Jan-2013 04:58
> To: [email protected]
> Subject: Synthetic Tokens
> 
> Hi,
> 
> I would like to develop a plugin that creates synthetic tokens for
> some documents that are crawled by Nutch (as described here:
> http://www.ideaeng.com/synthetic-tokens-need-p2-0604). How can this be
> done in Nutch? Should I create a new field for every new synthetic
> token, or should I add them to metadata? I'm not quite sure how
> fields/metadata relate to the tokens described in the article.
> 
> Thanks!
> Jakub
> 

Reply via email to