. . wrote:
Are there any plans to cache search result pages like say gigablast does,this would speed up the engine so much, I feel at mozdex I have to wait someone for the results to come back.

I've never found that caching search results helps much.

First, the data used to resolve and display frequently-queried terms and frequently-returned documents will be cached by the filesystem, so queries using these will not perform disk i/o. This index data is compressed, making the filesystem's cache a very efficient use of RAM.

Second, unlike terms, complete queries do not repeat themselves frequently enough that a large cache of results seems to help overall performance. I don't recall the exact numbers, but, when we computed them at Excite, we found that caching the top hits of thousands of queries would result in a cache hit rate of less than 10%. That is not much reward for the amount of memory this would consume.

If someone has a large query log then they can evaluate this for themselves. How many queries does it take to account for, e.g., 40% of queries overall? One must be careful not to "overfit" here. As a methodology, you might chop your log in two, then take the most frequent queries in the first half, and find out what percentage of the second half of the log they account for.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to