[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

[https://github.com/apache/incubator-ratis/runs/927310008?check_suite_focus=true]

[https://github.com/apache/incubator-ratis/runs/926606136?check_suite_focus=true]

 

!image-2020-07-31-12-33-35-755.png!

 

!image-2020-07-31-12-34-08-384.png!

 

!image-2020-07-31-12-40-11-183.png!

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

!image-2020-07-31-12-33-35-755.png!

 

!image-2020-07-31-12-34-08-384.png!

 

!image-2020-07-31-12-40-11-183.png!

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png, 
> image-2020-07-31-12-34-08-384.png, image-2020-07-31-12-40-11-183.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> [https://github.com/apache/incubator-ratis/runs/927310008?check_suite_focus=true]
> [https://github.com/apache/incubator-ratis/runs/926606136?check_suite_focus=true]
>  
> !image-2020-07-31-12-33-35-755.png!
>  
> !image-2020-07-31-12-34-08-384.png!
>  
> !image-2020-07-31-12-40-11-183.png!
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Attachment: image-2020-07-31-12-40-11-183.png

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png, 
> image-2020-07-31-12-34-08-384.png, image-2020-07-31-12-40-11-183.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> !image-2020-07-31-12-33-35-755.png!
>  
> !image-2020-07-31-12-34-08-384.png!
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

!image-2020-07-31-12-33-35-755.png!

 

!image-2020-07-31-12-34-08-384.png!

 

!image-2020-07-31-12-40-11-183.png!

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

!image-2020-07-31-12-33-35-755.png!

 

!image-2020-07-31-12-34-08-384.png!

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png, 
> image-2020-07-31-12-34-08-384.png, image-2020-07-31-12-40-11-183.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> !image-2020-07-31-12-33-35-755.png!
>  
> !image-2020-07-31-12-34-08-384.png!
>  
> !image-2020-07-31-12-40-11-183.png!
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Attachment: image-2020-07-31-12-34-08-384.png

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png, 
> image-2020-07-31-12-34-08-384.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> !image-2020-07-31-12-33-35-755.png!
>  
> !image-2020-07-31-12-34-08-384.png!
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

!image-2020-07-31-12-33-35-755.png!

 

!image-2020-07-31-12-34-08-384.png!

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png, 
> image-2020-07-31-12-34-08-384.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> !image-2020-07-31-12-33-35-755.png!
>  
> !image-2020-07-31-12-34-08-384.png!
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Attachment: image-2020-07-31-12-33-35-755.png

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
> Attachments: image-2020-07-31-12-33-35-755.png
>
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides changing election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides changing election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

 

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
>  
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable, especially resources is limited.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable, especially resources is limited.
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], since larger election timeout will make leader 
become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable.
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cased affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cases affected by LeaderState::checkLeadership() ?

 

  was:
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
leader become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable.
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], because larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cases affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
 

After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

Current walk around is to enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], since larger election timeout will make leader 
become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 

  was:
After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

The walk around is enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], since larger election timeout will make leader 
become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable.
> Current walk around is to enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], since larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cased affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

Such case need do node restart operation or membership change operation, which 
will make leader vulnerable.

The walk around is enlarge election timeout a little bit, e.g., from 
[150ms,300ms] to [300ms, 600ms], since larger election timeout will make leader 
become more stable.

 

TODO:

1) do we have better way besides change election timeout ?

2) are there other test cased affected by LeaderState::checkLeadership() ?

 

  was:
After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable.
> The walk around is enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], since larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cased affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Affects Version/s: 1.1.0

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Affects Versions: 1.1.0
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
> Such case need do node restart operation or membership change operation, 
> which will make leader vulnerable.
> The walk around is enlarge election timeout a little bit, e.g., from 
> [150ms,300ms] to [300ms, 600ms], since larger election timeout will make 
> leader become more stable.
>  
> TODO:
> 1) do we have better way besides change election timeout ?
> 2) are there other test cased affected by LeaderState::checkLeadership() ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
After merge  LeaderState::checkLeadership(), some test case become hard to pass 
under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.

 

  was:
After merge 

 


> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> After merge  LeaderState::checkLeadership(), some test case become hard to 
> pass under GitHub CI, such as GroupManagementBaseTest and TestMultiRaftGroup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Description: 
After merge 

 

  was:
We should make sure that the stale leader steps down to the candidate state 
before the next leader election.

Proposal:
In the heartbeat thread in the Leader node, we should check if the last 
response time of the follower is less than the leader election timeout. If the 
majority of the follower’s last response time is less than the leader election 
timeout, the current leader is still the active leader. Majority of the 
followers are heartbeating to the current leader, so there can’t be a new 
leader.

If the majority of follower’s last response time is greater than the leader 
election timeout, the current leader should step down and become a candidate.

With this check, we can be sure that the current leader will step down and 
become a candidate before the new leader election starts in case of a network 
partition.



> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> After merge 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Labels:   (was: pull-request-available)

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> We should make sure that the stale leader steps down to the candidate state 
> before the next leader election.
> Proposal:
> In the heartbeat thread in the Leader node, we should check if the last 
> response time of the follower is less than the leader election timeout. If 
> the majority of the follower’s last response time is less than the leader 
> election timeout, the current leader is still the active leader. Majority of 
> the followers are heartbeating to the current leader, so there can’t be a new 
> leader.
> If the majority of follower’s last response time is greater than the leader 
> election timeout, the current leader should step down and become a candidate.
> With this check, we can be sure that the current leader will step down and 
> become a candidate before the new leader election starts in case of a network 
> partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-1014) checkLeadership() may make some test case become flaky under GitHub CI.

2020-07-30 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1014:
-
Issue Type: Test  (was: Improvement)

> checkLeadership() may make some test case become flaky under GitHub CI.
> ---
>
> Key: RATIS-1014
> URL: https://issues.apache.org/jira/browse/RATIS-1014
> Project: Ratis
>  Issue Type: Test
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> We should make sure that the stale leader steps down to the candidate state 
> before the next leader election.
> Proposal:
> In the heartbeat thread in the Leader node, we should check if the last 
> response time of the follower is less than the leader election timeout. If 
> the majority of the follower’s last response time is less than the leader 
> election timeout, the current leader is still the active leader. Majority of 
> the followers are heartbeating to the current leader, so there can’t be a new 
> leader.
> If the majority of follower’s last response time is greater than the leader 
> election timeout, the current leader should step down and become a candidate.
> With this check, we can be sure that the current leader will step down and 
> become a candidate before the new leader election starts in case of a network 
> partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)