I am using nutch to crawl & index an intranet consisting of an initial fixed set of urls (approx. 3000). For my application I need to reference some metadata (stored in a database) for each of the original 3000 urls.
Does nutch assign a unique integer id for each starting url in the crawldb? If so, does the API allow me to get it? When a search is performed can/is this id returned for each 'hit'? I want my 'display search results' page to return the nutch results for each 'hit' as well as the metadata for the hit url if it is one of the original 3000. I'd rather use an integer ID than have to match on the url string itself. Marco Rondelli. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
