Doug,

There was a question in one of my last mails about document ids and more then one segment index.
In case you can answer this question and suggest a solution to get an unique document id then we can heavily improve the speed.


Stefan

Am 26.07.2004 um 21:51 schrieb Doug Cutting:

Michael Rosset wrote:
Attached is a patch for search.jsp adding support for grouping by host.

I just tried this on a test index with 160k pages. It gets really slow when there are lots of duplicates. I haven't looked too closely, but I assume this is because it has to look at lots of hit details.


I think we can accelerate this. We index the hostname in the "site" field. When re-querying we could add a clause to the query which prohibits sites we don't want to see any more hits from. This could be done with something like:

   query.addProhibitedTerm("site", host);

The query should be cloned first, which means that Query needs to be made cloneable.

Does this sound like a good approach to accelerating this? If so, Stefan or Andrzej, do you want to look into implementing this?

Doug


------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers


---------------------------------------------------------------
enterprise information technology consulting
open technology:        http://www.media-style.com
open discussion:        http://www.text-mining.org
open thoughts:          http://www.find23.net



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to