I am using nutch to crawl & index an intranet consisting of an initial
fixed set of urls (approx. 3000). For my application I need to reference
some metadata (stored in a database) for each of the original 3000 urls.

Does nutch assign a unique integer id for each starting url in the
crawldb? If so, does the API allow me to get it? When a search is
performed can/is this id returned for each 'hit'?

I want my 'display search results' page to return the nutch results for
each 'hit' as well as the metadata for the hit url if it is one of the
original 3000. I'd rather use an integer ID than have to match on the url
string itself.


Marco Rondelli.


Reply via email to