Hi.

I want to add some new fileds to each page. I found some articles about
it, that it's possible to do using plugins, but some things I still don't
understand how to do.

For example I've got a list of sites from DMOZ. They are stored in text
file. Each line contains data in format: [url] [category1] [category2]
[category3] - url of a page, and a list of
categories in wich this site is listed. One site can be listed in one, two
or more categories at one time. I want to start Nutch crawling this url's
and to add category information to each url. A field "category" that will
contain a list of categories, so that it would be possible to search only
sites from given category. So how is it possible to do?

All articles that I found could be applied when data for new custom field
retrieved from web page in crawling process (for example metadata from html
tags). But how to add custom field data before crawling process.

Thanks

Reply via email to