[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-22 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180717#comment-14180717
 ] 

Paul Elschot commented on LUCENE-1252:
--

Thinking out loud:

How about adding a method toDocWithAllRequiredTerms(), and a method 
toPositionMatch() that may move to a later doc?

These two together should do what nextDoc() does now.


 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178136#comment-14178136
 ] 

Michael McCandless commented on LUCENE-1252:


bq. I was fairly specific about how to tackle the problem... I dont mean to be 
rude, but did you actually read my proposal?

Yes, you were specific, and yes, I did in fact read the proposal,
several times, and as I said, I like it, but it's a complex problem
that I've also spent time thinking about and so I gave some simple
feedback about an alternative to one part of the proposal:

It just seems strange to me, when Lucene already has this notion of
boiling down complex queries into their low-level concrete equivalent
queries (i.e. rewrite) to then add a separate API (two-phase
execution) for doing what looks to me to be essentially the same
thing.  Two-phase execution is just rewriting to a two-clause
conjunction where the cost of each clause are wildly different,
opening up the freedom for BQ to execute more efficiently.  And,
yes, like rewrite, I think it should recurse.

I also think BQ should handle the costly nextDoc filters (i.e. do
them last) as first class citizens, and it seemed like the proposal
suggested otherwise (maybe I misunderstood this part of the proposal).


 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178374#comment-14178374
 ] 

Robert Muir commented on LUCENE-1252:
-

Both of those are covered. Yes its proposed as a first-class citizen (on DISI), 
more first-class than your second-class rewrite() solution.

It might be good to already look at what BQ and co do with query execution 
today. the fact is, rewrite() is inappropriate.

the main reason rewrite() is second-class here is because scoring is 
*per-segment* in lucene. In trunk today this means:
* filters might get executed with a completely different plan for different 
segments (conjunction versus Bits-acceptDocs).
* scorers might get returned as null for some segments (e.g. term doesnt 
exist). BQ doesn't have special null handling in every subscorer that it uses, 
that would be horrible. Instead it eliminates such scorers immediately in 
BooleanWeight's constructor. 
* this like the above can impact the structure of what BQ does (e.g. A or B, if 
B is null, it just returns A instead of a DisjunctionScorer).
* scorers have a cost() api, which tells us additional critical information for 
execution. I've proposed expanding this to much more (additional flags/methods) 
so that BQ can be a quarterback and not the entire football team.

another reason rewrite() is wrong here, is that such two-phase execution is 
only useful for conjunction (or: conjunction-like queries such as 
MinShouldMatch). Otherwise, its useless. Rewriting to two queries when the 
query is not a conjunction will not help performance.

at the low level, I can't see a rewrite() solution being efficient. Today 
ExactPhraseScorer is already coded as:

{code}
while (iterate approximation...) {
  confirm(); // phraseFreq  0
}
{code}

so in the case, its already ready to be exposed for two-phase execution.

On the other hand with rewrite(), it would either be 2 separate DPenums, or a 
mess? 

Finally, by having this generic on DISI, this approximation becomes much more 
flexible. For example a postings format could implement it for its docsenums, 
if it is able to implement a cheap approximation... perhaps via skiplist-like 
data, or perhaps with an explicit approximate cache mechanism specifically 
for this purpose. 

Another example is a filter like a geographic distance filter, could return a 
bounding box here automatically rather than forcing the user to deal with this 
themselves with complex query/filter hacks. At the same time, such a filter 
could signal that it has a linear time nextDoc, and booleanquery can really do 
the best execution. A rewrite() solution really makes this hard, because the 
two things would then be completely separate queries. But if you look at all 
the cases of executing this logic, its very useful for BQ to know that A is a 
superset of B.


 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176689#comment-14176689
 ] 

Michael McCandless commented on LUCENE-1252:


+1 to this plan.

Except if possible, I think high cost filters really should be first class in 
Lucene?  I think the two-phase execution is in fact the same problem?

I.e., a PhraseQuery should just rewrite itself into a conjunction of two 
clauses: the pure conjunction of the terms, which is fast to execute, AND'd 
with a high-cost filter (that loads positions and checks that the terms are 
actually a phrase).  If BQ knew how to juggle such costly clauses then this 
issue (AND'ing a PhraseQuery with other simple clauses) and other example like 
geo filtering (which could rewrite into a cheap bbox AND'd w/ slow true 
distance check) should work well.

But if somehow making high cost filters first class is too much, then I agree: 
let's punt, here.  It just seems to me that it's the same issue as two-phase 
execution...

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176790#comment-14176790
 ] 

Robert Muir commented on LUCENE-1252:
-

I was fairly specific about how to tackle the problem... I dont mean to be 
rude, but did you actually read my proposal?

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

2014-10-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176451#comment-14176451
 ] 

Robert Muir commented on LUCENE-1252:
-

I think we should keep the issue open, I know I've been thinking about this one 
a lot lately. 

The thing I see is, for it to work really nicely, BooleanQuery really needs to 
own execution of both queries and filters.

some kind of blurry proposal/plan like this:

Execute filters by BooleanQuery instead of its mini-me (FilteredQuery), e.g. as 
an additional type of BooleanClause. 

Merge Filter and Weight, in some way that makes sense, e.g. maybe just make 
Weight.scorer(LeafReaderContext context, Bits acceptDocs) a covariant-return 
override of Filter.getDocIdSet(LeafReaderContext context, Bits acceptDocs). 
Make sure any wrappers like ConstantScore delegate any new APIs correctly.

Add bulk methods like and/or/not to Filter such that optimized impls like 
FixedBitSet.and() can be used. Since java 7u40 these ones get autovectorized by 
hotspot and are a valid strategy. I think maybe some of these could be 
optimized by sparse bitset impls as well. 

Create an enhanced cost metric/execution API for filters. BooleanQuery needs 
this additional context to give the most efficient execution. At the least, it 
should have the information to know to do the bulk optos above, and even apply 
deletes this way if its appropriate (in lucene 5 deleted docs are a 
FixedBitSet). I would also want a way to indicate that a Filter has a 
linear-time nextDoc(). these cases (e.g. filtering by exact geographic 
distance) are horrible to support, but handling them correctly (e.g. in a final 
phase) is a lesser evil than having the API be crazy so that systems like 
solr/es can do them with hacks.

Remove stuff like FilteredQuery, BooleanFilter, etc. 

Fix LUCENE-3331 (or impl in some other way), such that scores are not needed 
is passed down the query execution stack. The tricky part is BQ's execution 
plan is currently in two places really, rewrite() and Weight.scorer(). And I 
really think it needs the freedom to be able to completely restructure queries 
for performance (across nested BQ as well). Another option is to setup internal 
infra so BooleanWeight.scorer() can do this, as it have cost() knowledge too, 
but it feels so wrong.

Finally, we should add some support for two-phase execution via 
DISI.getSuperSet() or some other approximation. ConjunctionScorer could both 
use (when at least one sub supports) and implement this method (when e.g. coord 
scoring prevents optimal restructing and its nested) for faster AND/filtering 
of phrase/sloppy/spans/whatever, or for any other custom query/filter that 
supports a fast approximation.




 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Reporter: Paul Elschot
Priority: Minor
  Labels: gsoc2014

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2010-05-13 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867094#action_12867094
 ] 

Paul Elschot commented on LUCENE-1252:
--

LUCENE-2410 has solved this partially for PhraseQuery/PhraseScorer by computing 
only the first matching phrase to determine a possible match, and by delaying 
the computation of the remaining matches until score() is called.

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-05-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713918#action_12713918
 ] 

Shai Erera commented on LUCENE-1252:


I must admit that I read the issue briefly, and so my proposal below may be 
irrelevant.

But it sounds to me like for the above example we'd want to use 
ConjunctionScorer to first iterate on both Scorers (PhraseScorer?) until they 
agree, and only then call Collector.collect(), which will call Scorer.score() 
in return (if a scoring collector is used).

Alternatively, we can implement a FilterCollector (if there isn't one already) 
which impls its collect(int doc) by first determining if a document is a match, 
and then call a given Collector, which will proceed with collecting and scoring.



 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-05-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713928#action_12713928
 ] 

Michael McCandless commented on LUCENE-1252:


I agree, we would want to do a ConjunctionScorer first, w/ all 4 terms, and 
then 2nd a PhraseScorer for each of the two phrases, but somehow they should be 
bound together such that a single TermPositions enumerator is shared in the two 
places for each term.

I think doing the additional filtering in collect is a little late -- there 
could be a number of such more expensive constraints to apply, depending on 
the query.

But, eg since we're talking about how to fix the up-front sort logic in 
ConjunctionScorer... you could imagine asking the PhraseQuery for its scorer, 
and getting back 2 AND'd scorers (cheap  expensive) that under-the-hood are 
sharing a single TermPositions enum, and then ConjunctionScorer would order all 
such scorers it got so that all cheap ones are checked first and only once they 
agree on a doc are the expensive scorers check.

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-04-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704711#action_12704711
 ] 

Michael McCandless commented on LUCENE-1252:


Here's a simple example that might drive this issue forward:

   +h1n1 flu +united states

Ideally, to score this query, you'd want to first AND all 4 terms
together, and only for docs matching that, consult the positions of
each pair of terms.

But we fail to do this today.

It's like somehow Weight.scorer() needs to be able to return a
cheap and an expensive scorer (which must be AND'd).  I think
PhraseQuery would somehow return cheap/expensive scorers that under
the hood share the same SegmentTermDocs/Positions iterators, such that
after cheap.next() has run, cheap.expensive only needs to check the
current doc.  So in fact maybe the expensive scorer should not be a
Scorer but some other simple passes or doesn't API.

Or maybe it returns say a TwoStageScorer, which adds a
reallyPasses() (needs better name) method to otherwise normal the
Scorer (DISI) API.

Or something else

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702520#action_12702520
 ] 

Michael McCandless commented on LUCENE-1252:


Right, I think this is more about determining whether a doc is a hit or not, 
than about how to compute its score.

I think somehow the scorer needs to return 2 scorers that share the underlying 
iterators.  The first scorer simply checks AND-ness with all other required 
terms, and only if the doc passes those are the 
positional/payloads/anything-else-expensive consulted.

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-04-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701957#action_12701957
 ] 

Paul Elschot commented on LUCENE-1252:
--

There is no patch for now.

HitCollectors should not be affected by this, as they would only be involved 
when a real match is found, and that, when position info is needed, necessarily 
involves the positions.

Extending this with a cheap score brings another issue: should a cheap score be 
given for a document that might match, but in the end does not really match 
when positions are used? At the moment, I don't think so: score values are 
normally cheap to compute, but accessing positions is not cheap.





 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1252) Avoid using positions when not all required terms are present

2009-04-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701726#action_12701726
 ] 

Jason Rutherglen commented on LUCENE-1252:
--

When flexible indexing goes in, users will be able to put data
into the index that allow scorers to calculate a cheap score,
collect, then go through and calculate a presumably more
expensive score. 

Would it be good to implement this patch with this sort of more
general framework in mind? 

It seems like this could affect the HitCollector API as we'd
want a more generic way of representing scores than the
primitive float we assume now. Aren't we rewriting the
HitCollector APIs right now? Can we implement this change now?

 Avoid using positions when not all required terms are present
 -

 Key: LUCENE-1252
 URL: https://issues.apache.org/jira/browse/LUCENE-1252
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Reporter: Paul Elschot
Priority: Minor

 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
 currently next() and skipTo() will use position information even when other 
 parts of the query cannot match because some required terms are not present.
 This could be avoided by adding some methods to Scorer that relax the 
 postcondition of next() and skipTo() to something like all required terms 
 are present, but no position info was checked yet, and implementing these 
 methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
 SpanScorer/NearSpans.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org