I am using nutch to crawl & index an intranet consisting of an initial fixed set of urls (approx. 3000). For my application I need to reference some metadata (stored in a database) for each of the original 3000 urls.
Does nutch assign a unique integer id for each starting url in the crawldb? If so, does the API allow me to get it? When a search is performed can/is this id returned for each 'hit'? I want my 'display search results' page to return the nutch results for each 'hit' as well as the metadata for the hit url if it is one of the original 3000. I'd rather use an integer ID than have to match on the url string itself. Marco Rondelli.
