[jira] [Comment Edited] (IGNITE-12401) Some Text Queries return repeated results

Yuriy Shuliha (Jira) Fri, 29 Nov 2019 16:18:18 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985221#comment-16985221
 ]


Yuriy Shuliha  edited comment on IGNITE-12401 at 11/30/19 12:17 AM:
--------------------------------------------------------------------

After investigations, there're 2 causes of tests failure found:

*1. Related to Range tests only* 
 {{testTextQueryWithRange()}} uses range {{"[10 TO 20}" }}In this case the 
Lucene response "unexpectedly" will also contain a key "2", but this key is not 
among expected keys, because they are selected  by predicate {{x -> 
String.valueOf( x ).startsWith("1")}} .
 The clue is that here Lucene compares range boundaries as Strings. Where "10" 
is less then "2" and "20" is less then "3".
 
[https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#compareTo(java.lang.String)]

More detailed explanation:
 ++++

For {{v IN range[A TO B]}}
 Lucene uses rather the next logic with string comparison (black-box tested by 
me):
 {{(A.compareTo(v) <= inclusive ? 0 : -1) && (B.compareTo(v) >= inclusive ? 0 : 
1)}}

{{=== A range 10 includes ===  === B range 20 excludes ===}}
 {{"10".compareTo("2") == -1    "20".compareTo("2") == 1  // passed, false 
positive}}
 {{"10".compareTo("15") == -5   "20".compareTo("15") == 1 // passed}}
 {{"10".compareTo("25") == -1   "20".compareTo("20") == 0 // passed}}
 {{"10".compareTo("3") == -2    "20".compareTo("3") == -1 // not passed}}

++++

*2. Distributed Query produces responses from two nodes that partially contain 
the same keys*

When checking the  text query response, I noticed that GridLuceneIndex.query() 
was executed on 2 grids/nodes per single request. This is expected for 
distributed query. But on the other hand is cache mode is PARTITIONED, they 
should not contain overlapping keys.
 But here Lucene indexes on different nodes definitely contain data from the 
same partition.

During final join with limit, key duplicates appears in response, that fails 
assertion where unique values are expected.

Backups are set to 0 for current cache. Does it mean that entities are 
incorrectly distributed over partitions/nodes?

This part needs extra investigations.
 Still no direct links found with this issue and {{limits}} introduced earlier.

CC [~amashenkov], [~Pavlukhin]


was (Author: yuriy_shuliha):
After investigations, there're 2 causes of tests failure found:

*1. Related to Range tests only* 
{{testTextQueryWithRange()}} uses range {{"[10 TO 20}" }}In this case the 
Lucene response "unexpectedly" will also contain a key "2", but this key is not 
among expected keys, because they are selected  by predicate {{x -> 
String.valueOf(x).startsWith("1")}} .
The clue is that here Lucene compares range boundaries as Strings. Where "10" 
is less then "2" and "20" is less then "3".
[https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#compareTo(java.lang.String)]

More detailed explanation:
++++

For {{v IN range[A TO B]}}
Lucene uses rather the next logic with string comparison (black-box tested by 
me):
{{(A.compareTo(v) <= inclusive ? 0 : -1) && (B.compareTo(v) >= inclusive ? 0 : 
1)}}

{{=== A range 10 includes ===  === B range 20 excludes ===}}
{{"10".compareTo("2") == -1    "20".compareTo("2") == 1  // passed, false 
positive}}
{{"10".compareTo("15") == -5   "20".compareTo("15") == 1 // passed}}
{{"10".compareTo("25") == -1   "20".compareTo("20") == 0 // passed}}
{{"10".compareTo("3") == -2    "20".compareTo("3") == -1 // not passed}}

++++


*2. Distributed Query produces responses from two nodes that partially contain 
the same keys*

When checking the  text query response, I noticed that GridLuceneIndex.query() 
was executed on 2 grids/nodes per single request. This is expected for 
distributed query. But on the other hand is cache mode is PARTITIONED, they 
should not contain overlapping keys.
But here Lucene indexes on different nodes definitely contain data from the 
same partition.

During final join with limit, key duplicates appears in response, that fails 
assertion where unique values are expected.

Backups are set to 0 for current cache. Does it mean that entities are 
incorrectly distributed over partitions/nodes?

This part needs extra investigations.
Still no direct links found with this issue and {{limits}} introduced earlier.

CC [~amashenkov], [~Pavlukhin]

> Some Text Queries return repeated results
> -----------------------------------------
>
>                 Key: IGNITE-12401
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12401
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.8
>            Reporter: Ilya Kasnacheev
>            Assignee: Yuriy Shuliha 
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It came to my attention while checking for Range Queries support that we 
> don't actually check that found query results are the correct ones.
> We were checking that we got some results, but not whether they were expected.
> And voila, it turns out that Range Queries examples, as well as some other 
> test cases, will readily fail when run with such checks! A query will return 
> same value repeatedly, e.g. range query will return the "1" record twice, and 
> limited text query will return "14" record twice.
> It didn't really occur on non-range queries before the introduction of limits.
> I think we should not ship broken limit queries. Maybe also fix range 
> queries, if it's hard let's @Ignore them for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (IGNITE-12401) Some Text Queries return repeated results

Reply via email to