Re: SpanQuery and Spans optimizations

2009-08-12 Thread Paul Cowan
Michael McCandless wrote: I think eventually span queries should be absorbed into the normal lucene queries. EG, if TermQuery creates a scorer that's able to optionally enumerate matching spans, such that there's no performance loss if you don't actuallly request the spans, then we don't need Sp

Re: SpanQuery and Spans optimizations

2009-08-12 Thread Grant Ingersoll
On Aug 12, 2009, at 5:58 AM, Michael McCandless wrote: I think being able to ask the Scorer for matching spans for the current doc makes tons of sense. I think eventually span queries should be absorbed into the normal lucene queries. EG, if TermQuery creates a scorer that's able to optionall

Re: SpanQuery and Spans optimizations

2009-08-12 Thread Michael McCandless
I think being able to ask the Scorer for matching spans for the current doc makes tons of sense. I think eventually span queries should be absorbed into the normal lucene queries. EG, if TermQuery creates a scorer that's able to optionally enumerate matching spans, such that there's no performanc

Re: SpanQuery and Spans optimizations

2009-08-08 Thread Shai Erera
That would work. Though your custom TopSpansCollector should be able to handle other Scorers as well. And you can store the payloads in yet another custom ScoreDoc - is that what you had in mind? Shai On Sat, Aug 8, 2009 at 3:06 AM, Grant Ingersoll wrote: > > On Aug 6, 2009, at 5:09 PM, Grant I

Re: SpanQuery and Spans optimizations

2009-08-07 Thread Grant Ingersoll
On Aug 6, 2009, at 5:09 PM, Grant Ingersoll wrote: On Aug 6, 2009, at 5:06 PM, Shai Erera wrote: Only w/ ScoreDocs we reuse the same instance. So I guess we'd like to do the same here. Seems like providing a TopSpansCollector is what you want, only unlike TopFieldCollector which populat

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Grant Ingersoll
On Aug 6, 2009, at 5:06 PM, Shai Erera wrote: Only w/ ScoreDocs we reuse the same instance. So I guess we'd like to do the same here. Seems like providing a TopSpansCollector is what you want, only unlike TopFieldCollector which populates the fields post search, you'd like to do it durin

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Shai Erera
Only w/ ScoreDocs we reuse the same instance. So I guess we'd like to do the same here. Seems like providing a TopSpansCollector is what you want, only unlike TopFieldCollector which populates the fields post search, you'd like to do it during search. I've been typing and deleting suggestions for

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Grant Ingersoll
On Aug 6, 2009, at 4:25 PM, Shai Erera wrote: But still you might collect spans for docs unnecessarily during processing. If a doc is added to the PQ and later removed, then the spans collection was just a waste of time (unless the collection comes in free during query processing). sure,

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Shai Erera
But still you might collect spans for docs unnecessarily during processing. If a doc is added to the PQ and later removed, then the spans collection was just a waste of time (unless the collection comes in free during query processing). Also, if you build a paging search UI, then as soon as the us

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Grant Ingersoll
On Aug 6, 2009, at 2:31 PM, Paul Elschot wrote: With a single search one might end up collecting lots of span info that will be thrown away because the document score is too low. Presumably, you would only collect it if the result was actually put onto the PriorityQueue, in other words, aft

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Paul Elschot
With a single search one might end up collecting lots of span info that will be thrown away because the document score is too low. So I think the best way is to first collect the best hits in the usual way, and then get the spans of the query (effectively once more, but now without SpanScorer in b

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Mark Miller
>> besides the fact that Spans is an interface and it would break back compat, ugh! back compat is almost out the window for Spans and 2.9 - we already broke it with the payloads, so PayloadSpans had been merged to Spans. I don't know that we have time to squeeze anything in (2.9 is so close !

Re: SpanQuery and Spans optimizations

2009-08-06 Thread Grant Ingersoll
seek() seems somewhat doable, although inefficient because the underlying TermPositions supports seek, but that really would only allow us to go back to the beginning, I think (besides the fact that Spans is an interface and it would break back compat, ugh!). Collector route seems more pro

SpanQuery and Spans optimizations

2009-08-06 Thread Grant Ingersoll
I think it is fairly common use case (relative to the rather uncommon use case of using SpanQuery that is) to want to do something like: ... SpanQuery sq = ... topDocs = searcher.search(tq, 10); Spans spans = sq.getSpans(searcher.getIndexReader()); for (int i = 0; i < topDocs.scoreDocs.length;