I am using nutch to crawl & index an intranet consisting of an initial
fixed set of urls (approx. 3000). For my application I need to reference
some metadata (stored in a database) for each of the original 3000 urls.

Does nutch assign a unique integer id for each starting url in the
crawldb? If so, does the API allow me to get it? When a search is
performed can/is this id returned for each 'hit'?

I want my 'display search results' page to return the nutch results for
each 'hit' as well as the metadata for the hit url if it is one of the
original 3000. I'd rather use an integer ID than have to match on the url
string itself.


Marco Rondelli.



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to