[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms

2021-10-11 Thread Nikolay Khitrin (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427230#comment-17427230
 ] 

Nikolay Khitrin commented on LUCENE-10140:
--

Just amazing :)

> Minimizing intervals can give inaccurate positions for duplicate terms
> --
>
> Key: LUCENE-10140
> URL: https://issues.apache.org/jira/browse/LUCENE-10140
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Nikolay Khitrin
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move 
> sub iterators to non-sub-match position *inside* match window, but 
> CachingMatchesIterator logic relies on heuristic that any position inside 
> matching interval is a sub-match.
> For example: ORDERED("a", "b", "a") over "a b a" highlights (report 
> sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") 
>  highlights only "a b a b a".
> Looks like there is no way to determine the right moment to cache from 
> caching iterator perspective, so I propose to add an interface allowing 
> minimizing IntervalIterators notify sub-sources positioned at sub-match 
> positions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms

2021-10-11 Thread Nikolay Khitrin (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427156#comment-17427156
 ] 

Nikolay Khitrin commented on LUCENE-10140:
--

[~romseygeek], yes, core fix is the same and looks like it covers all 
discovered cases.

Very-very minor point: it is not obvious that MatchCallback argument in 
combine() is ignored for non-minimizing sources, may be we should add small 
javadoc comment?

> Minimizing intervals can give inaccurate positions for duplicate terms
> --
>
> Key: LUCENE-10140
> URL: https://issues.apache.org/jira/browse/LUCENE-10140
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Nikolay Khitrin
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move 
> sub iterators to non-sub-match position *inside* match window, but 
> CachingMatchesIterator logic relies on heuristic that any position inside 
> matching interval is a sub-match.
> For example: ORDERED("a", "b", "a") over "a b a" highlights (report 
> sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") 
>  highlights only "a b a b a".
> Looks like there is no way to determine the right moment to cache from 
> caching iterator perspective, so I propose to add an interface allowing 
> minimizing IntervalIterators notify sub-sources positioned at sub-match 
> positions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10075) NPE on wildcard-based overlapping intervals highlighting

2021-10-04 Thread Nikolay Khitrin (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423860#comment-17423860
 ] 

Nikolay Khitrin commented on LUCENE-10075:
--

[~romseygeek], thanks for keeping attention to this issues. But, I'm still 
unsure, can LUCENE-10140 issue (also based on caching) be fixed by amending 
CachingMatchesIterator itself or we have to completely rebuild caching for 
proper highlighting?

> NPE on wildcard-based overlapping intervals highlighting
> 
>
> Key: LUCENE-10075
> URL: https://issues.apache.org/jira/browse/LUCENE-10075
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: main (9.0), 8.7
>Reporter: Nikolay Khitrin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> UnifiedHighlighter with WEIGHT_MATCHES flag throws an NullPointerException on 
> overlapping intervals with wildcard term.
> Minimal reproducible example
> Doc: "Compare Computer Science"
> Query: Intervals.maxgaps(1, Intervals.ordered(Intervals.wildcard(new 
> BytesRef("comp*")), Intervals.term("science")));
> Stacktrace:
>  
> {code:java}
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.lucene.search.MatchesIterator.endPosition()" because the return 
> value of "org.apache.lucene.util.PriorityQueue.top()" is 
> nulljava.lang.NullPointerException: Cannot invoke 
> "org.apache.lucene.search.MatchesIterator.endPosition()" because the return 
> value of "org.apache.lucene.util.PriorityQueue.top()" is null
>  at 
> org.apache.lucene.search.DisjunctionMatchesIterator.endPosition(DisjunctionMatchesIterator.java:233)
>  at 
> org.apache.lucene.queries.intervals.MultiTermIntervalsSource$1.endPosition(MultiTermIntervalsSource.java:132)
>  at 
> org.apache.lucene.search.FilterMatchesIterator.endPosition(FilterMatchesIterator.java:49)
>  at 
> org.apache.lucene.queries.intervals.CachingMatchesIterator.getSubMatches(CachingMatchesIterator.java:88)
>  at 
> org.apache.lucene.queries.intervals.MinimizingConjunctionMatchesIterator.getSubMatches(MinimizingConjunctionMatchesIterator.java:96)
>  at 
> org.apache.lucene.queries.intervals.IntervalMatches$1.getSubMatches(IntervalMatches.java:82)
>  at 
> org.apache.lucene.search.FilterMatchesIterator.getSubMatches(FilterMatchesIterator.java:64)
>  at 
> org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextWhenMatchesIterator(OffsetsEnum.java:209)
>  at 
> org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextPosition(OffsetsEnum.java:201)
>  at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:134)
>  at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83)
>  at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:635)
>  at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:505)
>  at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:483)
>  at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:416)
> {code}
>  
> Search by the same query completes without any exception, ordered/unordered 
> and larger gaps have no effect.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms

2021-09-30 Thread Nikolay Khitrin (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422853#comment-17422853
 ] 

Nikolay Khitrin commented on LUCENE-10140:
--

I'm afraid, it will not cover complex sub-intervals cases (multiterms: "ab* b 
a*" and sub-phrases, for example).

REPEATING cannot produce all the possible spans between sub-intervals (will 
lead to quadratic complexity), but only consecutive pairs of interval 
instances, which means that CONTAINS(REPEATING("a", 2), ORDERED("b", GAP, "c")) 
cannot match over "a b a c a", but ORDERED("a", "b", GAP, "c", "a") can.

Current caching also have an issue with calling endPosition() after 
nextInterval() == false (LUCENE-10075).

> Minimizing intervals can give inaccurate positions for duplicate terms
> --
>
> Key: LUCENE-10140
> URL: https://issues.apache.org/jira/browse/LUCENE-10140
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Nikolay Khitrin
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move 
> sub iterators to non-sub-match position *inside* match window, but 
> CachingMatchesIterator logic relies on heuristic that any position inside 
> matching interval is a sub-match.
> For example: ORDERED("a", "b", "a") over "a b a" highlights (report 
> sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") 
>  highlights only "a b a b a".
> Looks like there is no way to determine the right moment to cache from 
> caching iterator perspective, so I propose to add an interface allowing 
> minimizing IntervalIterators notify sub-sources positioned at sub-match 
> positions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10140) Minimizing intervals can give inaccurate positions for duplicate terms

2021-09-30 Thread Nikolay Khitrin (Jira)
Nikolay Khitrin created LUCENE-10140:


 Summary: Minimizing intervals can give inaccurate positions for 
duplicate terms
 Key: LUCENE-10140
 URL: https://issues.apache.org/jira/browse/LUCENE-10140
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queries
Reporter: Nikolay Khitrin


Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move 
sub iterators to non-sub-match position *inside* match window, but 
CachingMatchesIterator logic relies on heuristic that any position inside 
matching interval is a sub-match.

For example: ORDERED("a", "b", "a") over "a b a" highlights (report 
sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a")  
highlights only "a b a b a".

Looks like there is no way to determine the right moment to cache from caching 
iterator perspective, so I propose to add an interface allowing minimizing 
IntervalIterators notify sub-sources positioned at sub-match positions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10075) NPE on wildcard-based overlapping intervals highlighting

2021-08-27 Thread Nikolay Khitrin (Jira)
Nikolay Khitrin created LUCENE-10075:


 Summary: NPE on wildcard-based overlapping intervals highlighting
 Key: LUCENE-10075
 URL: https://issues.apache.org/jira/browse/LUCENE-10075
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.7, main (9.0)
Reporter: Nikolay Khitrin


UnifiedHighlighter with WEIGHT_MATCHES flag throws an NullPointerException on 
overlapping intervals with wildcard term.

Minimal reproducible example

Doc: "Compare Computer Science"

Query: Intervals.maxgaps(1, Intervals.ordered(Intervals.wildcard(new 
BytesRef("comp*")), Intervals.term("science")));

Stacktrace:

 
{code:java}
java.lang.NullPointerException: Cannot invoke 
"org.apache.lucene.search.MatchesIterator.endPosition()" because the return 
value of "org.apache.lucene.util.PriorityQueue.top()" is 
nulljava.lang.NullPointerException: Cannot invoke 
"org.apache.lucene.search.MatchesIterator.endPosition()" because the return 
value of "org.apache.lucene.util.PriorityQueue.top()" is null
 at 
org.apache.lucene.search.DisjunctionMatchesIterator.endPosition(DisjunctionMatchesIterator.java:233)
 at 
org.apache.lucene.queries.intervals.MultiTermIntervalsSource$1.endPosition(MultiTermIntervalsSource.java:132)
 at 
org.apache.lucene.search.FilterMatchesIterator.endPosition(FilterMatchesIterator.java:49)
 at 
org.apache.lucene.queries.intervals.CachingMatchesIterator.getSubMatches(CachingMatchesIterator.java:88)
 at 
org.apache.lucene.queries.intervals.MinimizingConjunctionMatchesIterator.getSubMatches(MinimizingConjunctionMatchesIterator.java:96)
 at 
org.apache.lucene.queries.intervals.IntervalMatches$1.getSubMatches(IntervalMatches.java:82)
 at 
org.apache.lucene.search.FilterMatchesIterator.getSubMatches(FilterMatchesIterator.java:64)
 at 
org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextWhenMatchesIterator(OffsetsEnum.java:209)
 at 
org.apache.lucene.search.uhighlight.OffsetsEnum$OfMatchesIteratorWithSubs.nextPosition(OffsetsEnum.java:201)
 at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:134)
 at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83)
 at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:635)
 at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:505)
 at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:483)
 at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:416)
{code}
 

Search by the same query completes without any exception, ordered/unordered and 
larger gaps have no effect.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9377) Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean clauses

2020-05-21 Thread Nikolay Khitrin (Jira)
Nikolay Khitrin created LUCENE-9377:
---

 Summary: Unknown query type SynonymQuery in 
ComplexPhraseQueryParser for boolean clauses
 Key: LUCENE-9377
 URL: https://issues.apache.org/jira/browse/LUCENE-9377
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.4
Reporter: Nikolay Khitrin
 Attachments: LUCENE-9377.patch

Follow up for LUCENE-7695.

ComplexPhraseQueryParser fails with
{code:java}
Unknown query type:org.apache.lucene.search.SynonymQuery{code}
exception on queries like name: "(dog cat) something" if dog expands by 
SynonymFilter.

For now parser converts to BooleanQuery only top-level SynonymQueries, but not 
the nested ones.

Looks like it can be fixed by simple conversion in BQ clauses handling loop 
similar to LUCENE-7695.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9377) Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean clauses

2020-05-21 Thread Nikolay Khitrin (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-9377:

   Attachment: LUCENE-9377.patch
Lucene Fields: New,Patch Available  (was: New)
   Status: Open  (was: Open)

> Unknown query type SynonymQuery in ComplexPhraseQueryParser for boolean 
> clauses
> ---
>
> Key: LUCENE-9377
> URL: https://issues.apache.org/jira/browse/LUCENE-9377
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.4
>Reporter: Nikolay Khitrin
>Priority: Major
> Attachments: LUCENE-9377.patch
>
>
> Follow up for LUCENE-7695.
> ComplexPhraseQueryParser fails with
> {code:java}
> Unknown query type:org.apache.lucene.search.SynonymQuery{code}
> exception on queries like name: "(dog cat) something" if dog expands by 
> SynonymFilter.
> For now parser converts to BooleanQuery only top-level SynonymQueries, but 
> not the nested ones.
> Looks like it can be fixed by simple conversion in BQ clauses handling loop 
> similar to LUCENE-7695.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org