[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-02-08 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-16406:
-
Description: 
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can slow down a bit message handling, for example by adding 
this code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}

{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing.

UPD: 
We decided to add checking for consistency between received scan command and 
handled scan command in partition listener, so now a user will get state 
machine error and could retry his command. But we found another inconsistency 
when RocksDB could return hasNext == false after an unexpected step down of the 
leader (https://issues.apache.org/jira/browse/IGNITE-16478).

So, we decided then to change the replica factor to 1 in 
{{ItMixedQueriesTest}}, so there will be only one node in a partition Raft 
group, but we couldn't enable {{ItMixedQueriesTest}} because of new error 
https://issues.apache.org/jira/browse/IGNITE-16502


  was:
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can slow down a bit message handling, for example by adding 
this code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}

{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing.

UPD: We decided to add checking for consistency between received scan command 
and handled scan command in partition listener, so now a user will get state 
machine error and could retry his command. But we found another inconsistency 
when RocksDB could return hasNext == false after an unexpected step down of the 
leader (https://issues.apache.org/jira/browse/IGNITE-16478).

So, we decided then to change the replica factor to 1 in 
{{ItMixedQueriesTest}}, so there will be only one node in a partition Raft 
group, but we couldn't enable {{ItMixedQueriesTest}} because of new error 
https://issues.apache.org/jira/browse/IGNITE-16502



> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Blocker
>  Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing.
> UPD: 
> We decided to add checking for consistency between received scan command and 
> handled scan command in partition listener, so now a user will get state 
> machine error and could retry his command. But we found another inconsistency 
> when RocksDB could return hasNext == false after an unexpected step down of 
> the leader (https://issues.apache.org/jira/browse/IGNITE-16478).
> So, we decided then to change the replica factor to 1 in 
> {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft 
> group, but we couldn't enable {{ItMixedQueriesTest}} because of new error 
> https://issues.apache.org/jira/browse/IGNITE-16502



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-02-08 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-16406:
-
Description: 
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can slow down a bit message handling, for example by adding 
this code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}

{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing.

UPD: We decided to add checking for consistency between received scan command 
and handled scan command in partition listener, so now a user will get state 
machine error and could retry his command. But we found another inconsistency 
when RocksDB could return hasNext == false after an unexpected step down of the 
leader (https://issues.apache.org/jira/browse/IGNITE-16478).

So, we decided then to change the replica factor to 1 in 
{{ItMixedQueriesTest}}, so there will be only one node in a partition Raft 
group, but we couldn't enable {{ItMixedQueriesTest}} because of new error 
https://issues.apache.org/jira/browse/IGNITE-16502


  was:
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can slow down a bit message handling, for example by adding 
this code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}

{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing



> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Blocker
>  Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing.
> UPD: We decided to add checking for consistency between received scan command 
> and handled scan command in partition listener, so now a user will get state 
> machine error and could retry his command. But we found another inconsistency 
> when RocksDB could return hasNext == false after an unexpected step down of 
> the leader (https://issues.apache.org/jira/browse/IGNITE-16478).
> So, we decided then to change the replica factor to 1 in 
> {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft 
> group, but we couldn't enable {{ItMixedQueriesTest}} because of new error 
> https://issues.apache.org/jira/browse/IGNITE-16502



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-01-26 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-16406:
-
Priority: Blocker  (was: Major)

> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Blocker
>  Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-01-26 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-16406:
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-01-26 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-16406:
-
Labels: ignite-3  (was: )

> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (IGNITE-16406) SQL select operation could return incomplete data

2022-01-26 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-16406:
-
Description: 
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can slow down a bit message handling, for example by adding 
this code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}

{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing


  was:
For some reasons select operation couldn't return expected number of rows. We 
noticed that this happens when raft leader is changing. To increase 
reproducibility, we can a bit slow down message handling, for example add this 
code to {{MessageServiceImpl#onMessage(java.lang.String, 
org.apache.ignite.network.NetworkMessage)}}


{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}


Possible direction of research: 
we could check that we do not lose cursor.next command as a raft response 
during the process of leader changing



> SQL select operation could return incomplete data
> -
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>
> For some reasons select operation couldn't return expected number of rows. We 
> noticed that this happens when raft leader is changing. To increase 
> reproducibility, we can slow down a bit message handling, for example by 
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String, 
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research: 
> we could check that we do not lose cursor.next command as a raft response 
> during the process of leader changing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)