2) The price: clustering is usually resource-consuming, so for high-load services (dozens of queries per second) it is probably not an option (at least with the implementation I am going to provide in a minute). Also, clustering usually needs to be performed on a larger "window" of results than the user actually requested... 10 results is not much to cluster. I've set the 'default' to a hundred snippets, you can adjust it to your needs.
I should say that this is not that much a problem. In our experiment SnakeT clusters +200 snippets taken by ~16 different in ,~2-3 second.
Simply accessing 100-200 snippets can also be quite costly. In most deployments document text will not fit in memory, so 100-200 snippets requires 100-200 disk seeks, or around 1-2 seconds.
Doug
------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
