[jira] [Updated] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Issue Type: Improvement (was: Bug) > CLONE - shouldWithholdVotes() should be triggered for handling higher term > -- > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Affects Versions: 0.5.0 >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > Labels: pull-request-available > > > I assume that {{shouldWithholdVotes()}} is used to handle request vote > request with higher candidate term from disruptive server, but currently it > just ignored such requests, since it only take effect when > {{(state.getCurrentTerm() >= candidateTerm)}} . > shouldWithholdVotes() should be triggered for handling higher term. If > currentTerm is larger or equal to candidateTerm, just reject the request > vote, no need further handling. > > Current code is > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() < candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > Modify to > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() >= candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Labels: (was: pull-request-available) > CLONE - shouldWithholdVotes() should be triggered for handling higher term > -- > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > > I assume that {{shouldWithholdVotes()}} is used to handle request vote > request with higher candidate term from disruptive server, but currently it > just ignored such requests, since it only take effect when > {{(state.getCurrentTerm() >= candidateTerm)}} . > shouldWithholdVotes() should be triggered for handling higher term. If > currentTerm is larger or equal to candidateTerm, just reject the request > vote, no need further handling. > > Current code is > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() < candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > Modify to > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() >= candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Affects Version/s: (was: 0.5.0) > CLONE - shouldWithholdVotes() should be triggered for handling higher term > -- > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > Labels: pull-request-available > > > I assume that {{shouldWithholdVotes()}} is used to handle request vote > request with higher candidate term from disruptive server, but currently it > just ignored such requests, since it only take effect when > {{(state.getCurrentTerm() >= candidateTerm)}} . > shouldWithholdVotes() should be triggered for handling higher term. If > currentTerm is larger or equal to candidateTerm, just reject the request > vote, no need further handling. > > Current code is > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() < candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > Modify to > {code:java} > private boolean shouldWithholdVotes(long candidateTerm) { > if (state.getCurrentTerm() >= candidateTerm) { > return false; > } else if (isLeader()) { > return true; > } else { > // following a leader and not yet timeout > return isFollower() && state.hasLeader() > && > role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
Glen Geng created RATIS-1001: Summary: CLONE - shouldWithholdVotes() should be triggered for handling higher term Key: RATIS-1001 URL: https://issues.apache.org/jira/browse/RATIS-1001 Project: Ratis Issue Type: Bug Components: server Affects Versions: 0.5.0 Reporter: Glen Geng Assignee: Glen Geng I assume that {{shouldWithholdVotes()}} is used to handle request vote request with higher candidate term from disruptive server, but currently it just ignored such requests, since it only take effect when {{(state.getCurrentTerm() >= candidateTerm)}} . shouldWithholdVotes() should be triggered for handling higher term. If currentTerm is larger or equal to candidateTerm, just reject the request vote, no need further handling. Current code is {code:java} private boolean shouldWithholdVotes(long candidateTerm) { if (state.getCurrentTerm() < candidateTerm) { return false; } else if (isLeader()) { return true; } else { // following a leader and not yet timeout return isFollower() && state.hasLeader() && role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); } } {code} Modify to {code:java} private boolean shouldWithholdVotes(long candidateTerm) { if (state.getCurrentTerm() >= candidateTerm) { return false; } else if (isLeader()) { return true; } else { // following a leader and not yet timeout return isFollower() && state.hasLeader() && role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-924) rename raft group dir on disk when remove group is invoked
[ https://issues.apache.org/jira/browse/RATIS-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157174#comment-17157174 ] Cyrus Jackson commented on RATIS-924: - [~shashikant] I have linked the PR. Please review it. > rename raft group dir on disk when remove group is invoked > -- > > Key: RATIS-924 > URL: https://issues.apache.org/jira/browse/RATIS-924 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: RATIS-924.001.patch, screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Description: During SCM-HA, SCM not only needs to know if it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed with leader information, SCM can not distinguish that it is a leader of term 1 or that of term 3. was: I assume that {{shouldWithholdVotes()}} is used to handle request vote request with higher candidate term from disruptive server, but currently it just ignored such requests, since it only take effect when {{(state.getCurrentTerm() >= candidateTerm)}} . shouldWithholdVotes() should be triggered for handling higher term. If currentTerm is larger or equal to candidateTerm, just reject the request vote, no need further handling. Current code is {code:java} private boolean shouldWithholdVotes(long candidateTerm) { if (state.getCurrentTerm() < candidateTerm) { return false; } else if (isLeader()) { return true; } else { // following a leader and not yet timeout return isFollower() && state.hasLeader() && role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); } } {code} Modify to {code:java} private boolean shouldWithholdVotes(long candidateTerm) { if (state.getCurrentTerm() >= candidateTerm) { return false; } else if (isLeader()) { return true; } else { // following a leader and not yet timeout return isFollower() && state.hasLeader() && role.getFollowerState().map(FollowerState::shouldWithholdVotes).orElse(false); } } {code} > CLONE - shouldWithholdVotes() should be triggered for handling higher term > -- > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > > During SCM-HA, SCM not only needs to know if it is a leader, but also needs > to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed with leader information, SCM can not distinguish that > it is a leader of term 1 or that of term 3. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) expose currentTerm to LeaderInfoProto for supporting SCM-HA.
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Summary: expose currentTerm to LeaderInfoProto for supporting SCM-HA. (was: CLONE - shouldWithholdVotes() should be triggered for handling higher term) > expose currentTerm to LeaderInfoProto for supporting SCM-HA. > > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > > During SCM-HA, SCM not only needs to know whether it is a leader, but also > needs to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed together with leader information, SCM can not > distinguish a leader of term 1 from that of term 3. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) expose currentTerm to LeaderInfoProto for supporting SCM-HA.
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Description: During SCM-HA, SCM not only needs to know whether it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed together with leader information, SCM can not distinguish a leader of term 1 from that of term 3. BTW the way, according to [~nanda]'s design, SCM need propagate its term to Datanode, RaftServerImpl::getRoleInfoProto will be a good place to expose term from Ratis to SCM was: During SCM-HA, SCM not only needs to know whether it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed together with leader information, SCM can not distinguish a leader of term 1 from that of term 3. > expose currentTerm to LeaderInfoProto for supporting SCM-HA. > > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > During SCM-HA, SCM not only needs to know whether it is a leader, but also > needs to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed together with leader information, SCM can not > distinguish a leader of term 1 from that of term 3. > > BTW the way, according to [~nanda]'s design, SCM need propagate its term to > Datanode, RaftServerImpl::getRoleInfoProto will be a good place to expose > term from Ratis to SCM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) add currentTerm to LeaderInfoProto for supporting SCM-HA.
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Summary: add currentTerm to LeaderInfoProto for supporting SCM-HA. (was: expose currentTerm to LeaderInfoProto for supporting SCM-HA.) > add currentTerm to LeaderInfoProto for supporting SCM-HA. > - > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > During SCM-HA, SCM not only needs to know whether it is a leader, but also > needs to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed together with leader information, SCM can not > distinguish a leader of term 1 from that of term 3. > > BTW the way, according to [~nanda]'s design, SCM need propagate its term to > Datanode, RaftServerImpl::getRoleInfoProto will be a good place to expose > term from Ratis to SCM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) add currentTerm to LeaderInfoProto for supporting SCM-HA.
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Description: During SCM-HA, SCM not only needs to know whether it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed together with leader information, SCM can not distinguish a leader of term 1 from that of term 3. BTW the way, according to [~nanda]'s design, leader SCM need propagate its term to Datanode, RaftServerImpl::getRoleInfoProto() will be a good place to expose term from Ratis to SCM was: During SCM-HA, SCM not only needs to know whether it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed together with leader information, SCM can not distinguish a leader of term 1 from that of term 3. BTW the way, according to [~nanda]'s design, SCM need propagate its term to Datanode, RaftServerImpl::getRoleInfoProto will be a good place to expose term from Ratis to SCM > add currentTerm to LeaderInfoProto for supporting SCM-HA. > - > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > During SCM-HA, SCM not only needs to know whether it is a leader, but also > needs to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed together with leader information, SCM can not > distinguish a leader of term 1 from that of term 3. > > BTW the way, according to [~nanda]'s design, leader SCM need propagate its > term to Datanode, RaftServerImpl::getRoleInfoProto() will be a good place to > expose term from Ratis to SCM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1001) CLONE - shouldWithholdVotes() should be triggered for handling higher term
[ https://issues.apache.org/jira/browse/RATIS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glen Geng updated RATIS-1001: - Description: During SCM-HA, SCM not only needs to know whether it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed together with leader information, SCM can not distinguish a leader of term 1 from that of term 3. was: During SCM-HA, SCM not only needs to know if it is a leader, but also needs to know which term it is in charge of. Assume such a case: underlying raft node was leader on term 1, then step down as follower on term 2, then init election and become leader again on term 3. If term is not exposed with leader information, SCM can not distinguish that it is a leader of term 1 or that of term 3. > CLONE - shouldWithholdVotes() should be triggered for handling higher term > -- > > Key: RATIS-1001 > URL: https://issues.apache.org/jira/browse/RATIS-1001 > Project: Ratis > Issue Type: Improvement > Components: server >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > > > During SCM-HA, SCM not only needs to know whether it is a leader, but also > needs to know which term it is in charge of. > > Assume such a case: underlying raft node was leader on term 1, then step down > as follower on term 2, then init election and become leader again on term 3. > If term is not exposed together with leader information, SCM can not > distinguish a leader of term 1 from that of term 3. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-805) Add grpc metrics to ratis
[ https://issues.apache.org/jira/browse/RATIS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157505#comment-17157505 ] Aravindan Vijayan commented on RATIS-805: - [~ansh.khanna] Can you take a stab at this? > Add grpc metrics to ratis > - > > Key: RATIS-805 > URL: https://issues.apache.org/jira/browse/RATIS-805 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Ansh Khanna >Priority: Major > > https://github.com/grpc-ecosystem/java-grpc-prometheus talks about some of > the grpc metrics. It will be good to explore if we can add these metrics into > ratis as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (RATIS-805) Add grpc metrics to ratis
[ https://issues.apache.org/jira/browse/RATIS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan reassigned RATIS-805: --- Assignee: Ansh Khanna (was: Aravindan Vijayan) > Add grpc metrics to ratis > - > > Key: RATIS-805 > URL: https://issues.apache.org/jira/browse/RATIS-805 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Ansh Khanna >Priority: Major > > https://github.com/grpc-ecosystem/java-grpc-prometheus talks about some of > the grpc metrics. It will be good to explore if we can add these metrics into > ratis as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-1002) Fix appendEntries timeout issue seen in leader
[ https://issues.apache.org/jira/browse/RATIS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated RATIS-1002: --- Attachment: org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion-output.txt > Fix appendEntries timeout issue seen in leader > -- > > Key: RATIS-1002 > URL: https://issues.apache.org/jira/browse/RATIS-1002 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Priority: Major > Attachments: > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion-output.txt > > > Ratis leader currently sees appendEntries timeout while appending entries to > a follower. This is seen usually only for one of the followers. This leads to > write failures and test timeouts in Ozone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-1002) Fix appendEntries timeout issue seen in leader
[ https://issues.apache.org/jira/browse/RATIS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157489#comment-17157489 ] Lokesh Jain commented on RATIS-1002: The unit test contains the output of one of the tests in ozone where test timed out due to appendEntries timeout issue. > Fix appendEntries timeout issue seen in leader > -- > > Key: RATIS-1002 > URL: https://issues.apache.org/jira/browse/RATIS-1002 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Priority: Major > Attachments: > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion-output.txt > > > Ratis leader currently sees appendEntries timeout while appending entries to > a follower. This is seen usually only for one of the followers. This leads to > write failures and test timeouts in Ozone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-1002) Fix appendEntries timeout issue seen in leader
Lokesh Jain created RATIS-1002: -- Summary: Fix appendEntries timeout issue seen in leader Key: RATIS-1002 URL: https://issues.apache.org/jira/browse/RATIS-1002 Project: Ratis Issue Type: Bug Reporter: Lokesh Jain Ratis leader currently sees appendEntries timeout while appending entries to a follower. This is seen usually only for one of the followers. This leads to write failures and test timeouts in Ozone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-805) Add grpc metrics to ratis
[ https://issues.apache.org/jira/browse/RATIS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157510#comment-17157510 ] Ansh Khanna commented on RATIS-805: --- Sure [~avijayan] > Add grpc metrics to ratis > - > > Key: RATIS-805 > URL: https://issues.apache.org/jira/browse/RATIS-805 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Ansh Khanna >Priority: Major > > https://github.com/grpc-ecosystem/java-grpc-prometheus talks about some of > the grpc metrics. It will be good to explore if we can add these metrics into > ratis as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)