On Mon, 2004-07-26 at 12:51, Doug Cutting wrote:
> Michael Rosset wrote:
> > Attached is a patch for search.jsp adding support for grouping by host.
> 
> I just tried this on a test index with 160k pages.  It gets really slow 
> when there are lots of duplicates.  I haven't looked too closely, but I 
> assume this is because it has to look at lots of hit details.
> 
> I think we can accelerate this.  We index the hostname in the "site" 
> field.  When re-querying we could add a clause to the query which 
> prohibits sites we don't want to see any more hits from.  This could be 
> done with something like:
> 
>     query.addProhibitedTerm("site", host);
> 
> The query should be cloned first, which means that Query needs to be 
> made cloneable.
> 
> Does this sound like a good approach to accelerating this?  If so, 
> Stefan or Andrzej, do you want to look into implementing this?
> 
> Doug
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to