Marco,

We use a search caching system at Filangy -- uses lucene to save the Search
string, count, date and top 20 IDs of the pages. So all you have to do is
search for those IDs.

Yes, it still involves a search, but we have a distributed system with the
ID as the hash key for specifying on which server to find the details of the
page making the parallel search more efficient. This search is about 60-75%
faster than a regular search.

You should be able to put a similar implementation together. I'm willing to
release this code to the open domain, PROVIDED, you or anyone else whose
interested changes it to make it generic and release as open-source to
other's in the nutch community.

CC-
--------------------------------------------
Chirag Chaman | Filangy, Inc.

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 05, 2006 8:20 AM
To: [email protected]
Subject: Re: Caching the search results

Marco Vanossi wrote:
> Hi,
>
> Anybody knows how can I set Nutch to cache the results of the searches?
> I've heard about this feature but I am not finding the information....

Trivial web-level caching is easy to implement - just download osCache and
modify your web application settings according to its documentation.

Smart caching on the level of indexes is more difficult to implement, and
Nutch doesn't include anything like that. You may find this paper of
interest:

    http://www2005.org/cdrom/docs/p257.pdf

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com




Reply via email to