Hi.

I want to add some new fileds to each page. I found some articles about it,
that it's possible to do using plugins, but some things I still don't
understand how to do.

For example I've got a list of sites from DMOZ. They are stored in text
file. Each line contains data in format: [url] [category1] [category2]
[category3] - url of a page, and a list of categories in wich this site is
listed. One site can be listed in one, two or more categories at one time. I
want to start Nutch crawling this url's and to add category information to
each url. A field "category" that will contain a list of categories, so that
it would be possible to search only sites from given category. So how is it
possible to do?

All articles that I found could be applied when data for new custom field
retrieved from web page in crawling process (for example metadata from html
tags). But how to add custom field data before crawling process.

Thanks

Reply via email to