Re: [Nutch-dev] Search Results Clustering extension proposal (and sample implementation).

Doug Cutting Mon, 30 Aug 2004 12:36:00 -0700

Antonio Gulli wrote:

2) The price: clustering is usually resource-consuming, so for high-load services (dozens of queries per second) it is probably not an option (at least with the implementation I am going to provide in a minute). Also, clustering usually needs to be performed on a larger "window" of results than the user actually requested... 10 results is not much to cluster. I've set the 'default' to a hundred snippets, you can adjust it to your needs.
I should say that this is not that much a problem. In our experiment SnakeT clusters +200 snippets taken by ~16 different in ,~2-3 second.

Simply accessing 100-200 snippets can also be quite costly. In most deployments document text will not fit in memory, so 100-200 snippets requires 100-200 disk seeks, or around 1-2 seconds.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Search Results Clustering extension proposal (and sample implementation).

Reply via email to