[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-16406: - Description: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can slow down a bit message handling, for example by adding this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing. UPD: We decided to add checking for consistency between received scan command and handled scan command in partition listener, so now a user will get state machine error and could retry his command. But we found another inconsistency when RocksDB could return hasNext == false after an unexpected step down of the leader (https://issues.apache.org/jira/browse/IGNITE-16478). So, we decided then to change the replica factor to 1 in {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft group, but we couldn't enable {{ItMixedQueriesTest}} because of new error https://issues.apache.org/jira/browse/IGNITE-16502 was: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can slow down a bit message handling, for example by adding this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing. UPD: We decided to add checking for consistency between received scan command and handled scan command in partition listener, so now a user will get state machine error and could retry his command. But we found another inconsistency when RocksDB could return hasNext == false after an unexpected step down of the leader (https://issues.apache.org/jira/browse/IGNITE-16478). So, we decided then to change the replica factor to 1 in {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft group, but we couldn't enable {{ItMixedQueriesTest}} because of new error https://issues.apache.org/jira/browse/IGNITE-16502 > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Blocker > Labels: ignite-3 > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing. > UPD: > We decided to add checking for consistency between received scan command and > handled scan command in partition listener, so now a user will get state > machine error and could retry his command. But we found another inconsistency > when RocksDB could return hasNext == false after an unexpected step down of > the leader (https://issues.apache.org/jira/browse/IGNITE-16478). > So, we decided then to change the replica factor to 1 in > {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft > group, but we couldn't enable {{ItMixedQueriesTest}} because of new error > https://issues.apache.org/jira/browse/IGNITE-16502 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-16406: - Description: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can slow down a bit message handling, for example by adding this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing. UPD: We decided to add checking for consistency between received scan command and handled scan command in partition listener, so now a user will get state machine error and could retry his command. But we found another inconsistency when RocksDB could return hasNext == false after an unexpected step down of the leader (https://issues.apache.org/jira/browse/IGNITE-16478). So, we decided then to change the replica factor to 1 in {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft group, but we couldn't enable {{ItMixedQueriesTest}} because of new error https://issues.apache.org/jira/browse/IGNITE-16502 was: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can slow down a bit message handling, for example by adding this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Blocker > Labels: ignite-3 > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing. > UPD: We decided to add checking for consistency between received scan command > and handled scan command in partition listener, so now a user will get state > machine error and could retry his command. But we found another inconsistency > when RocksDB could return hasNext == false after an unexpected step down of > the leader (https://issues.apache.org/jira/browse/IGNITE-16478). > So, we decided then to change the replica factor to 1 in > {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft > group, but we couldn't enable {{ItMixedQueriesTest}} because of new error > https://issues.apache.org/jira/browse/IGNITE-16502 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-16406: - Priority: Blocker (was: Major) > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Blocker > Labels: ignite-3 > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-16406: - Ignite Flags: (was: Docs Required,Release Notes Required) > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Major > Labels: ignite-3 > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-16406: - Labels: ignite-3 (was: ) > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Major > Labels: ignite-3 > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data
[ https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-16406: - Description: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can slow down a bit message handling, for example by adding this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing was: For some reasons select operation couldn't return expected number of rows. We noticed that this happens when raft leader is changing. To increase reproducibility, we can a bit slow down message handling, for example add this code to {{MessageServiceImpl#onMessage(java.lang.String, org.apache.ignite.network.NetworkMessage)}} {code:java} if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { try { Thread.sleep(300); } catch (Exception ex) { ex.printStackTrace(); } } {code} Possible direction of research: we could check that we do not lose cursor.next command as a raft response during the process of leader changing > SQL select operation could return incomplete data > - > > Key: IGNITE-16406 > URL: https://issues.apache.org/jira/browse/IGNITE-16406 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Major > > For some reasons select operation couldn't return expected number of rows. We > noticed that this happens when raft leader is changing. To increase > reproducibility, we can slow down a bit message handling, for example by > adding this code to {{MessageServiceImpl#onMessage(java.lang.String, > org.apache.ignite.network.NetworkMessage)}} > {code:java} > if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) { > try { > Thread.sleep(300); > } catch (Exception ex) { > ex.printStackTrace(); > } > } > {code} > Possible direction of research: > we could check that we do not lose cursor.next command as a raft response > during the process of leader changing -- This message was sent by Atlassian Jira (v8.20.1#820001)