Re: Sort by relevance+distance

2005-09-20 Thread markharw00d
To avoid caching 10,025 docs when you only want to see 10,000 to 10,025 (and assuming the user was paging through results) you might have to remember the lowest score used in the previous page of results to avoid adding those 10,000 docs with score > lastLowScore to the HitQueue again.

Re: Sort by relevance+distance

2005-09-19 Thread James Huang
Cool! Only one question: if we have class RelevanceAndDistanceCollector extends HitCollector { public ScoreDoc[] getMatches(int start, int size) { ... } } and a call of getMatches(1, 25); would not cache as many as 1+ docs, would it? Remember this is the whole point o

Re: Sort by relevance+distance

2005-09-19 Thread markharw00d
Here's an example I put together to illustrate the point. package distance; import java.io.IOException; import java.util.ArrayList; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lu

Re: Sort by relevance+distance

2005-09-19 Thread Jeff Rodenburg
This is interesting, one I had not considered. Mark - are there any code samples that implement this approach? Or maybe something similar in approach? thanks, jeff On 9/19/05, mark harwood <[EMAIL PROTECTED]> wrote: > > I think the HitCollector approach was fine but needed > a couple of changes

Re: Sort by relevance+distance

2005-09-19 Thread James Huang
I think this is probably the closest thing I like to/am able to do now. If I ever get to do this, I'll share the idea/code and seek review and suggestions. Thank you very much, Mark, and all others that have helped! -James mark harwood <[EMAIL PROTECTED]> wrote: I think the HitCollector appro

Re: Sort by relevance+distance

2005-09-19 Thread mark harwood
I think the HitCollector approach was fine but needed a couple of changes: 1) use a PriorityQueue subclass in place of the SortedSet to keep only the top n scoring docs 2) multiply lucene score by a distance measurement based on the current doc's location (doc location being read from a cached arra

Re: Sort by relevance+distance

2005-09-19 Thread Paul Elschot
On Sep 18, 2005, at 3:39 PM, James Huang wrote: > So the question is, is there a way to overriding score > calculation at runtime? In the lucene/search package, > I see interfaces like Scorer, Weight and methods like > Query.createWeight(). This looks promising. You indeed need to override the fol

Re: Sort by relevance+distance

2005-09-18 Thread Jeff Rodenburg
I like Erik's suggestion here as a starting point. I would guess you might find some direction in the Scorer class, but I haven't gone through this in detail. Conceptually a sliding weight based on proximity sounds correct... -- jeff On Sep 18, 2005, at 3:39 PM, James Huang wrote: > > So the

Re: Sort by relevance+distance

2005-09-18 Thread Erik Hatcher
On Sep 18, 2005, at 3:39 PM, James Huang wrote: So the question is, is there a way to overriding score calculation at runtime? In the lucene/search package, I see interfaces like Scorer, Weight and methods like Query.createWeight(). This looks promising. There are several ways to adjust scorin

Re: Sort by relevance+distance

2005-09-18 Thread James Huang
--- Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > trimming the post further: > > On 9/18/05, James Huang <[EMAIL PROTECTED]> wrote: > > > > >The problem is quite generic, I believe. What I > like to do is similar to > > LIA-ch6, i.e. to find a "good Chinese Hunan-style > restaurant near me." I

Re: Sort by relevance+distance

2005-09-18 Thread Jeff Rodenburg
trimming the post further: On 9/18/05, James Huang <[EMAIL PROTECTED]> wrote: > > >The problem is quite generic, I believe. What I like to do is similar to > LIA-ch6, i.e. to find a "good Chinese Hunan-style restaurant near me." I > prefer Hunan-style; however, if a good Human-style one is 12 m

Re: Sort by relevance+distance

2005-09-18 Thread James Huang
See comments below. --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > [trimming the post a bit] > > On Sep 18, 2005, at 11:51 AM, James Huang wrote: > > The problem is quite generic, I believe. What I > like > > to do is similar to LIA-ch6, i.e. to find a "good > > Chinese Hunan-style restaurant nea

Re: Sort by relevance+distance

2005-09-18 Thread Erik Hatcher
[trimming the post a bit] On Sep 18, 2005, at 11:51 AM, James Huang wrote: The problem is quite generic, I believe. What I like to do is similar to LIA-ch6, i.e. to find a "good Chinese Hunan-style restaurant near me." I prefer Hunan-style; however, if a good Human-style one is 12 miles, where t

Re: Sort by relevance+distance

2005-09-18 Thread Erik Hatcher
On Sep 18, 2005, at 11:10 AM, James Huang wrote: --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Sep 18, 2005, at 10:24 AM, James Huang wrote: --- Erik Hatcher <[EMAIL PROTECTED]> wrote: Get back to using your DistanceComparatorSource, and couple that with a SortField.FIELD_

Re: Sort by relevance+distance

2005-09-18 Thread James Huang
--- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Sep 18, 2005, at 10:24 AM, James Huang wrote: > > > --- Erik Hatcher <[EMAIL PROTECTED]> > wrote: > > > > > >> Get back to using your DistanceComparatorSource, > and > >> couple that with > >> a SortField.FIELD_SCORE, like this: > >> > >> Sort

Re: Sort by relevance+distance

2005-09-18 Thread Erik Hatcher
On Sep 18, 2005, at 10:24 AM, James Huang wrote: --- Erik Hatcher <[EMAIL PROTECTED]> wrote: Get back to using your DistanceComparatorSource, and couple that with a SortField.FIELD_SCORE, like this: Sort sort = new Sort(new SortField[] {new SortField("location", new DistanceCompara

Re: Sort by relevance+distance

2005-09-18 Thread James Huang
--- Erik Hatcher <[EMAIL PROTECTED]> wrote: > Get back to using your DistanceComparatorSource, and > couple that with > a SortField.FIELD_SCORE, like this: > > Sort sort = new Sort(new SortField[] {new > SortField("location", > new DistanceComparatorSource( you need>)), > SortField.F

Re: Sort by relevance+distance

2005-09-18 Thread Erik Hatcher
On Sep 17, 2005, at 7:00 PM, James Huang wrote: I use a custom collector: [...] Then, use IndexSearcher.search(qry, collector); So what happens if you get 10M results from a search? This seems to work. What I wish for is that sorting is done by the search engine itself, hoping for a bet

Re: Sort by relevance+distance

2005-09-17 Thread James Huang
I use a custom collector: class ResultCollector extends HitCollector { SortedSet set = new TreeSet(); IndexSearcher searcher; Location me; ResultCollector(IndexSearcher searcher, Location me) { this.me = me; this.searcher = searcher; } public void collect(int id, float scor

Re: Sort by relevance+distance

2005-09-17 Thread Erik Hatcher
On Sep 17, 2005, at 4:10 PM, James Huang wrote: Hi, I can sort the search results by distance now. But, the relevance is lost. I like to have the results sorted by relevance + distance, i.e., relevance first; for results of similar relevance, order by distance. How to do that? How are you c

Re: Sort by relevance+distance

2005-09-17 Thread James Huang
I guess I can use HitCollector and implement my own sorting, right? Is there a better approach? --- James Huang <[EMAIL PROTECTED]> wrote: > Hi, > > I can sort the search results by distance now. But, > the relevance is lost. > > I like to have the results sorted by relevance + > distance, i.e

Sort by relevance+distance

2005-09-17 Thread James Huang
Hi, I can sort the search results by distance now. But, the relevance is lost. I like to have the results sorted by relevance + distance, i.e., relevance first; for results of similar relevance, order by distance. How to do that? Thanks a lot in advance! -James --- James Huang <[EMAIL PROTECTE