[ https://issues.apache.org/jira/browse/IGNITE-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985221#comment-16985221 ]
Yuriy Shuliha edited comment on IGNITE-12401 at 11/30/19 12:17 AM: -------------------------------------------------------------------- After investigations, there're 2 causes of tests failure found: *1. Related to Range tests only* {{testTextQueryWithRange()}} uses range {{"[10 TO 20}" }}In this case the Lucene response "unexpectedly" will also contain a key "2", but this key is not among expected keys, because they are selected by predicate {{x -> String.valueOf( x ).startsWith("1")}} . The clue is that here Lucene compares range boundaries as Strings. Where "10" is less then "2" and "20" is less then "3". [https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#compareTo(java.lang.String)] More detailed explanation: ++++ For {{v IN range[A TO B]}} Lucene uses rather the next logic with string comparison (black-box tested by me): {{(A.compareTo(v) <= inclusive ? 0 : -1) && (B.compareTo(v) >= inclusive ? 0 : 1)}} {{=== A range 10 includes === === B range 20 excludes ===}} {{"10".compareTo("2") == -1 "20".compareTo("2") == 1 // passed, false positive}} {{"10".compareTo("15") == -5 "20".compareTo("15") == 1 // passed}} {{"10".compareTo("25") == -1 "20".compareTo("20") == 0 // passed}} {{"10".compareTo("3") == -2 "20".compareTo("3") == -1 // not passed}} ++++ *2. Distributed Query produces responses from two nodes that partially contain the same keys* When checking the text query response, I noticed that GridLuceneIndex.query() was executed on 2 grids/nodes per single request. This is expected for distributed query. But on the other hand is cache mode is PARTITIONED, they should not contain overlapping keys. But here Lucene indexes on different nodes definitely contain data from the same partition. During final join with limit, key duplicates appears in response, that fails assertion where unique values are expected. Backups are set to 0 for current cache. Does it mean that entities are incorrectly distributed over partitions/nodes? This part needs extra investigations. Still no direct links found with this issue and {{limits}} introduced earlier. CC [~amashenkov], [~Pavlukhin] was (Author: yuriy_shuliha): After investigations, there're 2 causes of tests failure found: *1. Related to Range tests only* {{testTextQueryWithRange()}} uses range {{"[10 TO 20}" }}In this case the Lucene response "unexpectedly" will also contain a key "2", but this key is not among expected keys, because they are selected by predicate {{x -> String.valueOf(x).startsWith("1")}} . The clue is that here Lucene compares range boundaries as Strings. Where "10" is less then "2" and "20" is less then "3". [https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#compareTo(java.lang.String)] More detailed explanation: ++++ For {{v IN range[A TO B]}} Lucene uses rather the next logic with string comparison (black-box tested by me): {{(A.compareTo(v) <= inclusive ? 0 : -1) && (B.compareTo(v) >= inclusive ? 0 : 1)}} {{=== A range 10 includes === === B range 20 excludes ===}} {{"10".compareTo("2") == -1 "20".compareTo("2") == 1 // passed, false positive}} {{"10".compareTo("15") == -5 "20".compareTo("15") == 1 // passed}} {{"10".compareTo("25") == -1 "20".compareTo("20") == 0 // passed}} {{"10".compareTo("3") == -2 "20".compareTo("3") == -1 // not passed}} ++++ *2. Distributed Query produces responses from two nodes that partially contain the same keys* When checking the text query response, I noticed that GridLuceneIndex.query() was executed on 2 grids/nodes per single request. This is expected for distributed query. But on the other hand is cache mode is PARTITIONED, they should not contain overlapping keys. But here Lucene indexes on different nodes definitely contain data from the same partition. During final join with limit, key duplicates appears in response, that fails assertion where unique values are expected. Backups are set to 0 for current cache. Does it mean that entities are incorrectly distributed over partitions/nodes? This part needs extra investigations. Still no direct links found with this issue and {{limits}} introduced earlier. CC [~amashenkov], [~Pavlukhin] > Some Text Queries return repeated results > ----------------------------------------- > > Key: IGNITE-12401 > URL: https://issues.apache.org/jira/browse/IGNITE-12401 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.8 > Reporter: Ilya Kasnacheev > Assignee: Yuriy Shuliha > Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > It came to my attention while checking for Range Queries support that we > don't actually check that found query results are the correct ones. > We were checking that we got some results, but not whether they were expected. > And voila, it turns out that Range Queries examples, as well as some other > test cases, will readily fail when run with such checks! A query will return > same value repeatedly, e.g. range query will return the "1" record twice, and > limited text query will return "14" record twice. > It didn't really occur on non-range queries before the introduction of limits. > I think we should not ship broken limit queries. Maybe also fix range > queries, if it's hard let's @Ignore them for now. -- This message was sent by Atlassian Jira (v8.3.4#803005)