Hi All Can someone share his experience on how to design nutch as intranet search engine? The part I most interesting is how to design continuous indexing?
What I have done is: 1) Write one crawl scheduler which crawl the whole web every month. 2) Write one crawl listener which in fact deletes duplicates when the crawl scheduler finished. Regards /Jack
