span query matches too many docs when two query terms are the same unless inOrder=true --------------------------------------------------------------------------------------
Key: LUCENE-3120 URL: https://issues.apache.org/jira/browse/LUCENE-3120 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 spinoff of user list discussion - [SpanNearQuery - inOrder parameter|http://markmail.org/message/i4cstlwgjmlcfwlc]. With 3 documents: * "a b x c d" * "a b b d" * "a b x b y d" Here are a few queries (the number in parenthesis indicates expected #hits): These ones work *as expected*: * (1) in-order, slop=0, "b", "x", "b" * (1) in-order, slop=0, "b", "b" * (2) in-order, slop=1, "b", "b" These ones match *too many* hits: * (1) any-order, slop=0, "b", "x", "b" * (1) any-order, slop=1, "b", "x", "b" * (1) any-order, slop=2, "b", "x", "b" * (1) any-order, slop=3, "b", "x", "b" These ones match *too many* hits as well: * (1) any-order, slop=0, "b", "b" * (2) any-order, slop=1, "b", "b" Each of the above passes when using a phrase query (applying the slop, no in-order indication in phrase query). This seems related to a known overlapping spans issue - [non-overlapping Span queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, so we might decide to close this bug after all, but I would like to at least have the junit that exposes the behavior in JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org