Re: clustering results

2004-04-10 Thread Erik Hatcher
On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote: I have an index of urls, and need to display the top 10 results for a given query, but want to display only 1 result per domain. It seems that using either Hits or a HitCollector, I'll need to access the doc, grab the domain field (I'll have

Re: ValueListHandler pattern with Lucene

2004-04-10 Thread lucene
On Friday 09 April 2004 23:59, Ype Kingma wrote: When you need 3000 hits and their stored fields, you might consider using the lower level search API with your own HitCollector. I apologize for the stupid question but ... where's the actualy result in HitCollector? :-) collect(int doc,

Re: Highlighter package v2 RC1

2004-04-10 Thread markharw00d
Can I customize the way it does highlight terms? Right now it does so by arounding with b. That's the job of a formatter class. You can pass one in the constructor eg: Formatter myFormatter=new SimpleHTMLFormatter(i,/i); Highlighter h=new Highlighter(myFormatter, new QueryScorer(query))); If

Re: clustering results

2004-04-10 Thread Venu Durgam
Erik, Thanks for the poiner. I am not sure how sort can filter out results. sort will just sort the results right ? lets say if i had below results http://www.b.com/1.html http://www.a.com/1.html http://www.b.com/2.html http://www.a.com/2.html if you sort by domain name, results might be

Re: ValueListHandler pattern with Lucene

2004-04-10 Thread Erik Hatcher
On Apr 10, 2004, at 5:08 AM, [EMAIL PROTECTED] wrote: On Friday 09 April 2004 23:59, Ype Kingma wrote: When you need 3000 hits and their stored fields, you might consider using the lower level search API with your own HitCollector. I apologize for the stupid question but ... where's the actualy

Re: clustering results

2004-04-10 Thread Erik Hatcher
On Apr 10, 2004, at 9:47 AM, Venu Durgam wrote: I am not sure how sort can filter out results. sort will just sort the results right ? Right no filtering using Sort. lets say if i had below results http://www.b.com/1.html http://www.a.com/1.html http://www.b.com/2.html http://www.a.com/2.html

Re: clustering results

2004-04-10 Thread Michael A. Schoen
So as Venu pointed out, sorting doesn't seem to help the problem. If we have to walk the result set, access docs and dedupe using brute force, we're better off w/ the standard order by relevance. If you've got an example of this type of clustering done in a more efficient way, that'd be great.