Marco, We use a search caching system at Filangy -- uses lucene to save the Search string, count, date and top 20 IDs of the pages. So all you have to do is search for those IDs.
Yes, it still involves a search, but we have a distributed system with the ID as the hash key for specifying on which server to find the details of the page making the parallel search more efficient. This search is about 60-75% faster than a regular search. You should be able to put a similar implementation together. I'm willing to release this code to the open domain, PROVIDED, you or anyone else whose interested changes it to make it generic and release as open-source to other's in the nutch community. CC- -------------------------------------------- Chirag Chaman | Filangy, Inc. -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 05, 2006 8:20 AM To: [email protected] Subject: Re: Caching the search results Marco Vanossi wrote: > Hi, > > Anybody knows how can I set Nutch to cache the results of the searches? > I've heard about this feature but I am not finding the information.... Trivial web-level caching is easy to implement - just download osCache and modify your web application settings according to its documentation. Smart caching on the level of indexes is more difficult to implement, and Nutch doesn't include anything like that. You may find this paper of interest: http://www2005.org/cdrom/docs/p257.pdf -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
