[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms
[ https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427230#comment-17427230 ] Nikolay Khitrin commented on LUCENE-10140: -- Just amazing :) > Minimizing intervals can give inaccurate positions for duplicate terms > -- > > Key: LUCENE-10140 > URL: https://issues.apache.org/jira/browse/LUCENE-10140 > Project: Lucene - Core > Issue Type: Bug > Components: modules/queries >Reporter: Nikolay Khitrin >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move > sub iterators to non-sub-match position *inside* match window, but > CachingMatchesIterator logic relies on heuristic that any position inside > matching interval is a sub-match. > For example: ORDERED("a", "b", "a") over "a b a" highlights (report > sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") > highlights only "a b a b a". > Looks like there is no way to determine the right moment to cache from > caching iterator perspective, so I propose to add an interface allowing > minimizing IntervalIterators notify sub-sources positioned at sub-match > positions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms
[ https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427156#comment-17427156 ] Nikolay Khitrin commented on LUCENE-10140: -- [~romseygeek], yes, core fix is the same and looks like it covers all discovered cases. Very-very minor point: it is not obvious that MatchCallback argument in combine() is ignored for non-minimizing sources, may be we should add small javadoc comment? > Minimizing intervals can give inaccurate positions for duplicate terms > -- > > Key: LUCENE-10140 > URL: https://issues.apache.org/jira/browse/LUCENE-10140 > Project: Lucene - Core > Issue Type: Bug > Components: modules/queries >Reporter: Nikolay Khitrin >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move > sub iterators to non-sub-match position *inside* match window, but > CachingMatchesIterator logic relies on heuristic that any position inside > matching interval is a sub-match. > For example: ORDERED("a", "b", "a") over "a b a" highlights (report > sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") > highlights only "a b a b a". > Looks like there is no way to determine the right moment to cache from > caching iterator perspective, so I propose to add an interface allowing > minimizing IntervalIterators notify sub-sources positioned at sub-match > positions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10075) NPE on wildcard-based overlapping intervals highlighting
[ https://issues.apache.org/jira/browse/LUCENE-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423860#comment-17423860 ] Nikolay Khitrin commented on LUCENE-10075: -- [~romseygeek], thanks for keeping attention to this issues. But, I'm still unsure, can LUCENE-10140 issue (also based on caching) be fixed by amending CachingMatchesIterator itself or we have to completely rebuild caching for proper highlighting? > NPE on wildcard-based overlapping intervals highlighting > > > Key: LUCENE-10075 > URL: https://issues.apache.org/jira/browse/LUCENE-10075 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: main (9.0), 8.7 >Reporter: Nikolay Khitrin >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > UnifiedHighlighter with WEIGHT_MATCHES flag throws an NullPointerException on > overlapping intervals with wildcard term. > Minimal reproducible example > Doc: "Compare Computer Science" > Query: Intervals.maxgaps(1, Intervals.ordered(Intervals.wildcard(new > BytesRef("comp*")), Intervals.term("science"))); > Stacktrace: > > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.lucene.search.MatchesIterator.endPosition()" because the return > value of "org.apache.lucene.util.PriorityQueue.top()" is > nulljava.lang.NullPointerException: Cannot invoke > "org.apache.lucene.search.MatchesIterator.endPosition()" because the return > value of "org.apache.lucene.util.PriorityQueue.top()" is null > at > org.apache.lucene.search.DisjunctionMatchesIterator.endPosition(DisjunctionMatchesIterator.java:233) > at > org.apache.lucene.queries.intervals.MultiTermIntervalsSource$1.endPosition(MultiTermIntervalsSource.java:132) > at > org.apache.lucene.search.FilterMatchesIterator.endPosition(FilterMatchesIterator.java:49) > at > org.apache.lucene.queries.intervals.CachingMatchesIterator.getSubMatches(CachingMatchesIterator.java:88) > at > org.apache.lucene.queries.intervals.MinimizingConjunctionMatchesIterator.getSubMatches(MinimizingConjunctionMatchesIterator.java:96) > at > org.apache.lucene.queries.intervals.IntervalMatches$1.getSubMatches(IntervalMatches.java:82) > at > org.apache.lucene.search.FilterMatchesIterator.getSubMatches(FilterMatchesIterator.java:64) > at > org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextWhenMatchesIterator(OffsetsEnum.java:209) > at > org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextPosition(OffsetsEnum.java:201) > at > org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:134) > at > org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83) > at > org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:635) > at > org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:505) > at > org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:483) > at > org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:416) > {code} > > Search by the same query completes without any exception, ordered/unordered > and larger gaps have no effect. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms
[ https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422853#comment-17422853 ] Nikolay Khitrin commented on LUCENE-10140: -- I'm afraid, it will not cover complex sub-intervals cases (multiterms: "ab* b a*" and sub-phrases, for example). REPEATING cannot produce all the possible spans between sub-intervals (will lead to quadratic complexity), but only consecutive pairs of interval instances, which means that CONTAINS(REPEATING("a", 2), ORDERED("b", GAP, "c")) cannot match over "a b a c a", but ORDERED("a", "b", GAP, "c", "a") can. Current caching also have an issue with calling endPosition() after nextInterval() == false (LUCENE-10075). > Minimizing intervals can give inaccurate positions for duplicate terms > -- > > Key: LUCENE-10140 > URL: https://issues.apache.org/jira/browse/LUCENE-10140 > Project: Lucene - Core > Issue Type: Bug > Components: modules/queries >Reporter: Nikolay Khitrin >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move > sub iterators to non-sub-match position *inside* match window, but > CachingMatchesIterator logic relies on heuristic that any position inside > matching interval is a sub-match. > For example: ORDERED("a", "b", "a") over "a b a" highlights (report > sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") > highlights only "a b a b a". > Looks like there is no way to determine the right moment to cache from > caching iterator perspective, so I propose to add an interface allowing > minimizing IntervalIterators notify sub-sources positioned at sub-match > positions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms
Nikolay Khitrin created LUCENE-10140: Summary: Minimizing intervals can give inaccurate positions for duplicate terms Key: LUCENE-10140 URL: https://issues.apache.org/jira/browse/LUCENE-10140 Project: Lucene - Core Issue Type: Bug Components: modules/queries Reporter: Nikolay Khitrin Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move sub iterators to non-sub-match position *inside* match window, but CachingMatchesIterator logic relies on heuristic that any position inside matching interval is a sub-match. For example: ORDERED("a", "b", "a") over "a b a" highlights (report sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") highlights only "a b a b a". Looks like there is no way to determine the right moment to cache from caching iterator perspective, so I propose to add an interface allowing minimizing IntervalIterators notify sub-sources positioned at sub-match positions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10075) NPE on wildcard-based overlapping intervals highlighting
Nikolay Khitrin created LUCENE-10075: Summary: NPE on wildcard-based overlapping intervals highlighting Key: LUCENE-10075 URL: https://issues.apache.org/jira/browse/LUCENE-10075 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.7, main (9.0) Reporter: Nikolay Khitrin UnifiedHighlighter with WEIGHT_MATCHES flag throws an NullPointerException on overlapping intervals with wildcard term. Minimal reproducible example Doc: "Compare Computer Science" Query: Intervals.maxgaps(1, Intervals.ordered(Intervals.wildcard(new BytesRef("comp*")), Intervals.term("science"))); Stacktrace: {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.lucene.search.MatchesIterator.endPosition()" because the return value of "org.apache.lucene.util.PriorityQueue.top()" is nulljava.lang.NullPointerException: Cannot invoke "org.apache.lucene.search.MatchesIterator.endPosition()" because the return value of "org.apache.lucene.util.PriorityQueue.top()" is null at org.apache.lucene.search.DisjunctionMatchesIterator.endPosition(DisjunctionMatchesIterator.java:233) at org.apache.lucene.queries.intervals.MultiTermIntervalsSource$1.endPosition(MultiTermIntervalsSource.java:132) at org.apache.lucene.search.FilterMatchesIterator.endPosition(FilterMatchesIterator.java:49) at org.apache.lucene.queries.intervals.CachingMatchesIterator.getSubMatches(CachingMatchesIterator.java:88) at org.apache.lucene.queries.intervals.MinimizingConjunctionMatchesIterator.getSubMatches(MinimizingConjunctionMatchesIterator.java:96) at org.apache.lucene.queries.intervals.IntervalMatches$1.getSubMatches(IntervalMatches.java:82) at org.apache.lucene.search.FilterMatchesIterator.getSubMatches(FilterMatchesIterator.java:64) at org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextWhenMatchesIterator(OffsetsEnum.java:209) at org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextPosition(OffsetsEnum.java:201) at org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:134) at org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83) at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:635) at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:505) at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:483) at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:416) {code} Search by the same query completes without any exception, ordered/unordered and larger gaps have no effect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9377) Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean clauses
Nikolay Khitrin created LUCENE-9377: --- Summary: Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean clauses Key: LUCENE-9377 URL: https://issues.apache.org/jira/browse/LUCENE-9377 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.4 Reporter: Nikolay Khitrin Attachments: LUCENE-9377.patch Follow up for LUCENE-7695. ComplexPhraseQueryParser fails with {code:java} Unknown query type:org.apache.lucene.search.SynonymQuery{code} exception on queries like name: "(dog cat) something" if dog expands by SynonymFilter. For now parser converts to BooleanQuery only top-level SynonymQueries, but not the nested ones. Looks like it can be fixed by simple conversion in BQ clauses handling loop similar to LUCENE-7695. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9377) Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean clauses
[ https://issues.apache.org/jira/browse/LUCENE-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-9377: Attachment: LUCENE-9377.patch Lucene Fields: New,Patch Available (was: New) Status: Open (was: Open) > Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean > clauses > --- > > Key: LUCENE-9377 > URL: https://issues.apache.org/jira/browse/LUCENE-9377 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Nikolay Khitrin >Priority: Major > Attachments: LUCENE-9377.patch > > > Follow up for LUCENE-7695. > ComplexPhraseQueryParser fails with > {code:java} > Unknown query type:org.apache.lucene.search.SynonymQuery{code} > exception on queries like name: "(dog cat) something" if dog expands by > SynonymFilter. > For now parser converts to BooleanQuery only top-level SynonymQueries, but > not the nested ones. > Looks like it can be fixed by simple conversion in BQ clauses handling loop > similar to LUCENE-7695. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org