Not sure if this would be the easiest solution but you might want to
have a look at
http://wiki.apache.org/nutch/WritingPluginExample-0.9
I have used it as template code to add fields to my index.
Hope this helps,
Jasper
On Aug 10, 2007, at 11:46 AM, Vince Filby wrote:
I have a list of businesses, urls and extra information that I need to
crawl. I have used Nutch to crawl this list without following
external link
and it seems to be working well, but I need to relate the crawled
web text
data (including all pages and sub-pages within the original domain)
with the
original business record in the database. I need to add an ID
field into
each document in the generated index that references the business ID.
How can I do this with Nutch? Can it be done at inject/fetch time
or will I
have to try to match urls to ID's after the index is generated.
Cheers,
Vince