[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602117#comment-16602117 ] Alan Woodward commented on LUCENE-8196: --- [~Martin Hermann] seeing as this ticket is closed, I opened LUCENE-8477 to continue discussion. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597355#comment-16597355 ] Martin Hermann commented on LUCENE-8196: [~romseygeek] 1) I agree that this might be a solution, but as it differs from the setting of the paper should be done very carefully. 2) Internal slop seems like a great idea! You're right, my example wasn't very good and {{Intervals.phrase()}} already does that. But still, if you think of a bigger query and e.g. one slop (say, {{"a ("big bad" OR evil) wolf", one additional token allowed somewhere}}), the problem remains. I don't really see how 'internal slop' would differ from 'normal slop' (doesn't it measure the exact same thing?), but it seems rather easy to implement and like something that would be desirable and solve this issue. 3) I'm not quite sure if I understand that correctly. Do you mean using a gap in the query and rewrite it to something like {noformat} "bad wolf" (slop 1) contained by "big GAP wolf" (slop 2) {noformat} or adding the gap automatically somewhere down the way? I think in the first case it'd still be possible to construct some (maybe a little bit more complicated) examples that can't be solved like that and where the minimal intervals behaviour does not match intuition. Again, while a lot of these queries may seem quite exotic, I think that intervals will get used a lot various programmatically generated queries (as spans do now), and there pretty much anything can happen. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596348#comment-16596348 ] Alan Woodward commented on LUCENE-8196: --- Hi [~Martin Hermann] Thanks for the detailed feedback - this is very helpful! 1) As with Spans, one way to fix the issue with OR intervals is to change the precedence rules so that longer intervals sort before their prefixes. I need to go re-read the paper's proof concerning the OR operator, it would be interesting to see if this ends up causing problems elsewhere . Another option would be to add a separate IntervalsSource with this behaviour, maybe triggered as a parameter on {{Intervals.or()}} 2) Intervals don't really have the notion of 'slop' that Spans do, but we could add the idea of an 'internal slop' to ordered and unordered spans. This would be measured as the space within an interval not taken up by the component intervals. Your {{("big bad" OR evil) wolf}} query I think can already be done using {{Intervals.phrase()}}? 3) Spans have the notion of a 'gap' Span, which could be usefully added here. This could help with avoiding minimization in your CONTAINS query > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595002#comment-16595002 ] Martin Hermann commented on LUCENE-8196: First of all, I really like this implementation and the ideas that went into it. But as I have spent quite some time with the old span queries and their problems, I'd like to comment on some things and maybe offer some fresh view points for old problems: Obviously, maxwidth is not completely identical to specifying slop: Let's say we want to do some sort of synonym expansion and query for "("big bad" OR evil) wolf" (this is of course related to the prefix-problem we already know about ("genome editing"), but I think still slightly different). With span queries, this would have been possible, as we just have to set slop to 0 in all queries, but now we have to do something like {code:java} Intervals.maxwidth(3, Intervals.ordered( Intervals.or( Intervals.maxwidth(2, Intervals.ordered(Intervals.term("big"),Intervals.term("bad"))), Intervals.term("evil")), Intervals.term("wolf")));{code} which also matches "evil eyes wolf", which should not be a match. It would be possible to rewrite the query so that the disjunction is at the top level, something like {code:java} Intervals.or( Intervals.maxwidth(2, Intervals.ordered(Intervals.term("evil"),Intervals.term("wolf"))), Intervals.maxwidth(3, Intervals.ordreed(Intervals.term("big"),Intervals.term("bad"),Intervals.term("wolf";{code} which would work as expected, but I think we can agree that this is not really a nice solution (but I will come back to it later). Now, we already know that "(big OR "big bad") wolf" would not match "big bad wolf" (this is exactly the genome editing thing), but I think it is worth to point out exactly why: It actually should not match, according to the definition of "minimum interval": Any match for "big bad" is also a match for big, so the first IntervalsSource only passes matches for "big", and then we get no match for "big wolf". This is a feature of the query semantic of the paper (and maybe the reason for the efficency and simplicity of the algorithms): The problems that spanQueries had are gone, because we define the unexpected behaviour to be correct*. As much as I like the IntervalQueries, I do not really think this is satisfactory. There are actually other, similar cases with containing/containedBy: Let's say our document is "big bad big wolf" and we want "bad wolf" (slop 1) to be contained by "big wolf" (slop 2). We would get no match in this document, as the minimal match for the big interval is just "big wolf" (as the other match, "big bad big wolf" contains this one). At least to me this is counter intuitive and I would expect the document to match. It really gets strange if we mix in some "OR": {noformat} "big wolf" (slop 1) contained in ("big wolf" (slop 1) OR "bad wolf") {noformat} does not match "big bad wolf", in contrast to {noformat} "big wolf" (slop 1) contained in ("big wolf" (slop 1)){noformat} , which does. So we actually lose a match by adding a OR-clause, and I think we can agree that this is not really good. Of course these are not queries a human would write, but I think one major use case of span queries is some sort of automatic query generation, and that's where I think it is really important to meet at least some basic expectations (such as not losing matches by adding disjunctions). I don't see a way to fix this that still follows minimal interval semantics, as all this is actually how it SHOULD work there, but this would mean we'd lose the correctness proofs. The only thing I can think of is some sort of query rewriting, pushing the disjunction as far top as neccessary, but this may be rather performance heavy and also does not solve the "bad wolf" (slop 1) contained by "big wolf" (slop 2) problem. Any thoughts? *A short theoretical aside: I think that most of the span query problems came from the fact that we want to have a "next match" function, i.e. some sort of ordering of matches, together with the nature of span query Matches, which are essentially a pair of numbers (start and end of match). This means we have to specify an order on pairs of numbers (which is possible, of course; the solution with span queries was a lexical order, i.e. the start always increases, and if it stays the same, the end increases). But I think it is not really possible to implement completly lazy behaviour with this ordering: Think of some ordered "((a OR b) followed by (c OR d)) with enough slop" and the document "a b c d" which should find "a b c d" before "b c" (as the start increases), but has to cache the match for "c", which (in the sub-query "(c OR d)) occurs before the one for "b". So the combination of
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468537#comment-16468537 ] Alan Woodward commented on LUCENE-8196: --- I opened LUCENE-8300 do deal with unordered overlaps. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453817#comment-16453817 ] ASF subversion and git services commented on LUCENE-8196: - Commit 1a18acd783745f8fa11042f854f96a3e5ed5aa72 in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a18acd ] LUCENE-8196: Fix unordered case with non-matching subinterval > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453816#comment-16453816 ] ASF subversion and git services commented on LUCENE-8196: - Commit 345fdff47cc6f09d774afefd46b2a653d0da7fa3 in lucene-solr's branch refs/heads/branch_7x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=345fdff ] LUCENE-8196: Fix unordered case with non-matching subinterval > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453767#comment-16453767 ] Alan Woodward commented on LUCENE-8196: --- I think minwidth() would run into problems with documents that have two instances of 'b', because unordered will always find the minimal intervals, so it would always end up with intervals of width 0, which would then be rejected by the filter, and you'd end up with missing matches. What we really need here I think is a new source, something like 'unordered-non-overlapping', which checks that all of the internal intervals are separated. With a better name, of course :) . And we should rename 'unordered' to 'and' to make the semantics a bit clearer. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450408#comment-16450408 ] Matt Weber commented on LUCENE-8196: [~jim.ferenczi] [~romseygeek] So given a single document with the value {{a b}}. The following queries would both match this document: {code:java} Intervals.unordered(Intervals.term("b"), Intervals.term("a")) {code} {code:java} Intervals.unordered(Intervals.term("b"), Intervals.term("b")) {code} The first I think would have an interval width of {{1}} and the 2nd should have a width of {{0}}. So if we have a {{minwidth}} operator we could use that to set the minimum width to {{1}} preventing the 2nd from matching? If both of these queries result in an interval with the same width then that feels wrong to me. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450158#comment-16450158 ] Matt Weber commented on LUCENE-8196: I use these queries to build query parsers and I am specifically thinking of an unordered near and how I can prevent it from matching the same term. I can't think of any situation where a user would think {{NEAR(a, a)}} would match documents with a single {{a}} and if we can't get that by default I would like a way to explicitly prevent it myself. Spans have the same issue as well, see LUCENE-3120. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450039#comment-16450039 ] Jim Ferenczi commented on LUCENE-8196: -- I don't think an operator can prevent anything here, a query for *Intervals.ordered(Intervals.term("w3"), Intervals.term("w3"))* should always return all intervals of the term "w3" (it will not interleave successive intervals of "w3"). [~mattweber] why do you think that this "scenario" should be prevented ? When I do "foo AND foo" I don't expect it to match only document that have foo twice ? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450014#comment-16450014 ] Matt Weber commented on LUCENE-8196: [~jim.ferenczi] [~romseygeek] I think rename to {{and}} makes sense, however, I would still live a way to explicitly prevent the scenario I described . Maybe a {{minwith}} operator? The width at the same position/interval should be {{0}} right? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449461#comment-16449461 ] Alan Woodward commented on LUCENE-8196: --- Good catch [~jim.ferenczi], I'll commit that change. I like the idea of changing *unordered* to *and* as well - I think that makes sense [~mattweber]? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449409#comment-16449409 ] Jim Ferenczi commented on LUCENE-8196: -- I don't think we should prevent anything ;). *unordered* is a conjunction operator so it should match if all terms match (which is the case in your example) so these results are expected IMO. Maybe we should rename *unordered* to *and* in order to avoid confusion ? If you want to match the same term within a max width the ordered query works fine: {code:java} Query q = new IntervalQuery(field, Intervals.maxwidth(2, Intervals.ordered(Intervals.term("w3"), Intervals.term("w3"; {code} [~romseygeek] while I was playing with *unordered* I realized that we don't protect against sources that match but don't have intervals. For instance: {code:java} Query q = new IntervalQuery(query, Intervals.unordered(Intervals.term("w2"), Intervals.ordered(Intervals.term("w3"),Intervals.term("w3"; {code} does not work because the *unordered* query doesn't check if the sub source has intervals when it adds it in the queue. I attached a patch that fixes this issue and added some tests that fail without the fix. Can you take a look ? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448387#comment-16448387 ] Alan Woodward commented on LUCENE-8196: --- bq. How would we prevent matching at the same interval? The original paper doesn't look like it addresses this. I'll try and work out the best way of dealing with things, I guess we'll need to keep track of the positions of internal intervals in the priority queue, and when we advance make sure that they don't collide. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446939#comment-16446939 ] Matt Weber commented on LUCENE-8196: [~romseygeek] This is great! How would we prevent matching at the same interval? In {{TestIntervalQuery}}, I would expect this to pass but it matches every doc with {{w3}}. {code:java} public void testUnorderedQueryNoSelfMatch() throws IOException { Query q = new IntervalQuery(field, Intervals.maxwidth(2, Intervals.unordered(Intervals.term("w3"), Intervals.term("w3"; checkHits(q, new int[]{1}); } {code} > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425211#comment-16425211 ] ASF subversion and git services commented on LUCENE-8196: - Commit 7117b68db6835acfeda17f04ab2c20a8c1ec2c17 in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7117b68 ] LUCENE-8196: Check for a null input in LowpassIntervalsSource > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425210#comment-16425210 ] ASF subversion and git services commented on LUCENE-8196: - Commit 7fcaac8550e340512c09a8d8f4bd4773096f63f3 in lucene-solr's branch refs/heads/branch_7x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7fcaac8 ] LUCENE-8196: Check for a null input in LowpassIntervalsSource > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424063#comment-16424063 ] ASF subversion and git services commented on LUCENE-8196: - Commit 1d6502cecb94330cd5a793ea82bbfe910c844d7f in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1d6502c ] LUCENE-8196: Check that the TermIntervalsSource is positioned on the correct term > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424062#comment-16424062 ] ASF subversion and git services commented on LUCENE-8196: - Commit b772b585a095b32593e1b99ea7ad110921f3c721 in lucene-solr's branch refs/heads/branch_7x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b772b58 ] LUCENE-8196: Check that the TermIntervalsSource is positioned on the correct term > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423660#comment-16423660 ] ASF subversion and git services commented on LUCENE-8196: - Commit 00eab54f9d6232c68a93f10ff20e3a724ffeca14 in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=00eab54 ] LUCENE-8196: Add IntervalQuery and IntervalsSource to the sandbox > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423659#comment-16423659 ] ASF subversion and git services commented on LUCENE-8196: - Commit 974c03a6ca8eed3941e1414dd2ecb75132228d4f in lucene-solr's branch refs/heads/branch_7x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=974c03a ] LUCENE-8196: Add IntervalQuery and IntervalsSource to the sandbox > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418695#comment-16418695 ] Jim Ferenczi commented on LUCENE-8196: -- +1 > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417092#comment-16417092 ] Alan Woodward commented on LUCENE-8196: --- I talked over disjunction ordering with [~jim.ferenczi], and we agreed to revert back to the ordering specified in the original paper. In future it might be worth contacting the authors and seeing if they've covered the case of prefix disjunctions elsewhere. I think this is ready? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404574#comment-16404574 ] Alan Woodward commented on LUCENE-8196: --- I moved the approximation and reset() from IntervalIterator, and it turned out that ConjunctionIntervalIterator was a convenient place to put them, so I kept that in. I also cleaned up the utility classes (no need to check for TwoPhaseIterator or BitSetIterator when we know that the leaves are always PostingsEnum) and moved to org.apache.lucene.search.intervals. Payload matches can be added trivially at a later point. Payload scoring will be a more interesting one, but I'm not entirely happy with the way scoring works at the moment anyway, I need to read around a bit more on good ways of scoring proximity queries in general. Offsets may not be trivial to add, as we have the same problem we originally had with Spans here, in that some of the algorithms advance leaf intervals before returning. Something to consider later on, definitely. The PQ specialization is just lifted directly from DisjunctionDISIApproximation in the existing core search package, so it wasn't my conclusion :) re extractTerms() - I don't like this method being here in the first place really, see my comment about scoring above. I think let's keep it simple for now, you can always filter afterwards. [~jim.ferenczi] - where can I remove null checks? I think I'm constrained by having to return [-1..-1] when iterator has moved to a new doc but hasn't been advanced over intervals yet. The disjunction order is to address LUCENE-7398. If a disjunction is appearing within a block, then sorting for minimum intervals can miss valid matches. The Vigna paper doesn't seem to discuss this case anywhere. One possible solution that would work in all cases would be to add a boolean to IntervalsSource.getIntervals() that indicates whether or not the source is in a final position - if it is, then it should return minimal intervals, otherwise it should return wider ones. This would only be applicable to disjunctions. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404532#comment-16404532 ] Jim Ferenczi commented on LUCENE-8196: -- +1 too, there are some places where you could initialize the current interval with[−∞ . . −∞] in order to avoid the nullity check. Most of the operators algorithm seem good, though I don't understand why you change the order of the disjunction ? If you don't start with the smallest right interval from the queue you could miss a lot of minimum intervals that could be needed if the disjunction is used inside another operator ? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402538#comment-16402538 ] David Smiley commented on LUCENE-8196: -- * Nice package javadocs! * Maybe you will some day add a means of extracting offsets (e.g. for highlighting) or payloads? * Just curious, how did you arrive at the conclusion that you needed to specialize the PriorityQueue? * what if extractTerms took a Consumer instead of a Set? It's easy to invoke with a myset::add for the common case when you have a Set, and I've seen cases where you might want to provide a filter before storing it wherever. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402069#comment-16402069 ] Adrien Grand commented on LUCENE-8196: -- Some comments: - ConjunctionIntervalIterator does so little, I suspect things would be easier to read if we removed it. - I'd like it better if we kept the API definition to a minimum on IntervalIterator and removed the constructor that takes an approximation and the reset() method. To me these should be implementation details? - Let's make the utility classes that you copied pkg-private? - Maybe let's put this in a sub package of search, ie. org.apache.lucene.search.intervals? Otherwise +1 > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400317#comment-16400317 ] Alan Woodward commented on LUCENE-8196: --- This patch moves everything into the sandbox for now, and adds a package-info explaining how to use things. I had to duplicate DisiWrapper, DisiPriorityQueue and DisjunctionDISIApproximation, but I don't think that's too much of a problem. Having things in sandbox should reduce confusion with Span queries, and give us time to try and switch things over. I think this is ready to be committed. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394980#comment-16394980 ] Alan Woodward commented on LUCENE-8196: --- This patch makes IntervalIterator extend DocIdSetIterator, and makes the per-document reset() function protected and called automatically on nextDoc() and advance(target). > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393188#comment-16393188 ] Jim Ferenczi commented on LUCENE-8196: -- {quote} I'd rather keep the API as it is, with the field being passed to IntervalQuery and then recursing down the IntervalSource tree. Otherwise you end up having to declare the field on all the created sources, which seems redundant. I've removed the cross-field hack entirely for the moment. {quote} +1 to remove the cross-field hack, thanks. Regarding the API it's ok since IntervalQuery limits all sources to one field so I am fine with that (I misunderstood how the IntervalQuery can be used). > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393182#comment-16393182 ] Alan Woodward commented on LUCENE-8196: --- I discussed scoring with [~jim.ferenczi] and [~jpountz] offline, and we decided to just use the inverse length of intervals as a sloppy frequency for now, as described in the Vigna paper linked above. This means that we can't compare scores directly with existing phrase queries, but the query mechanism is quite different (particularly for SloppyPhraseScorer) so it makes sense that scores won't be the same either. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392834#comment-16392834 ] Alan Woodward commented on LUCENE-8196: --- I opened a pull request at [https://github.com/apache/lucene-solr/pull/334] to make this easier to review. [~jpountz] I think I've addressed most of your feedback? [~jim.ferenczi] I'd rather keep the API as it is, with the field being passed to IntervalQuery and then recursing down the IntervalSource tree. Otherwise you end up having to declare the field on all the created sources, which seems redundant. I've removed the cross-field hack entirely for the moment. I'll see if I can improve the scoring next. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391723#comment-16391723 ] Jim Ferenczi commented on LUCENE-8196: -- {quote} I was a bit annoyed to see the field masking hack but actually those intervals source do not need term statistics which makes the hack less horrible. Could you still document it to make sure users are aware it is a hack and explain it which circumstances it might be ok? {quote} I think that the proposed API should be more restrictive regarding the targeted field. Could we restrict the IntervalsSource to work on a single field ? Something like: {code:java} public abstract class IntervalsSource { protected final String field; public IntervalsSource(String field) { this.field = field; } public abstract IntervalIterator intervals(LeafReaderContext ctx) throws IOException; ... {code} ... and then we can check in each implementation that the sources are all targeting the same field. I understand that it might be powerful to mix multiple fields in an interval query but with the current API that seems to be the norm rather than an exception. We can add the field masking hack afterward but for the first iteration I think it's better to focus on the main use case for this new query which is to provide a way to find the minimum intervals in a single field. Regarding the score of the intervals, it seems that the patch uses the inverse length of the interval rather than the slop within the interval like the sloppy phrase scorer. Could we compute the total slop of the current interval (as the sum of the slop of each interval source that composed this interval) and use its inverse to score each ? This would make different interval query more comparable in terms of score since an interval with few terms and a slop>0 would score less that one with more terms but no slop. I'll look deeper at the implementation of the different queries but I like the simplicity of the patch and the fact that there is a paper with a proof for each of them. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391580#comment-16391580 ] Alan Woodward commented on LUCENE-8196: --- {quote}Do we need {{IntervalIterator.score()}}?{quote} Yes, terms and phrases return 1 instead than taking the overall width into account. This is so that they score the same as TermQuery and PhraseQuery {quote}Do we need {{advanceTo}}?{quote} Unfortunately yes, or at least we need a way of resetting the iterator on each new document. It might be possible to avoid passing the doc down and having a return value though, I'll see what I can do. {quote}I would like some form of AssertingIntervalsSource{quote} This is a bit trickier, as it's not obvious where the wrapping would happen. +1 to everything else, I'll work on a follow-up. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391478#comment-16391478 ] Adrien Grand commented on LUCENE-8196: -- Thanks Alan. I agree that growing a separate hierarchy of objects might help land this feature. We might even want to put first iterations of this work in sandbox to give time for the API to stabilize before we move it to core or misc. I have some questions/comments: - Do we need {{IntervalIterator.score()}}? It seems to be the same value on all implementations. - Do we need {{advanceTo}}? It seems to me that things would be simpler and as efficient if you documented that nextPosition() may only be called when the approximation is positioned and then {{advanceTo}} would be equivalent to checking the return value of {{nextInterval}}? - Let's make the {{IntervalFunction}} API an implementation detail? - The documentation of {{cost()}} says it is the cost of finding the next interval but given how you use it in the query it looks like it is actually more about the average cost of iterating over _all_ intervals. - In terms of testing I would like some form of AssertingIntervalsSource to make sure that intervals are always consumed in legal ways and behave correctly. - More docs would help read the code. For instance IntervalsSource.intervals has no docs. By the way we might want to mention there that the same instance might be reused across calls. - TermIntervalsSource should check whether positions were indexed. - I was a bit annoyed to see the field masking hack but actually those intervals source do not need term statistics which makes the hack less horrible. Could you still document it to make sure users are aware it is a hack and explain it which circumstances it might be ok? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8196.patch > > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org