[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat
[ https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917409#comment-16917409 ] Shashikant Banerjee commented on RATIS-651: --- Thanks [~avijayan] for updating the patch. The patch looks good to me. I am +1 on this change. Will commit this shortly. > Add metrics related to leaderElection and HeartBeat > --- > > Key: RATIS-651 > URL: https://issues.apache.org/jira/browse/RATIS-651 > Project: Ratis > Issue Type: Sub-task > Components: server >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Aravindan Vijayan >Priority: Critical > Attachments: RATIS-651-000.patch, RATIS-651-001.patch, > RATIS-651-002.patch, RATIS-651-003.patch > > > Following metrics would be helpful to determine the leader election events > and timeouts: > > |numLeaderElections|Number of leader elections since the creation of ratis > pipeline| > |numLeaderElectionTimeouts|Number of leader election timeouts or failures| > |LeaderElectionCompletionLatency|Time required to complete a leader election| > |MaxNoLeaderInterval|Max time where there has been no elected leader in the > raft ring| > |heartBeatMissCount|No of times heartBeat response is missed from a server | -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (RATIS-669) Allow Ratis gRPCTlsConfig to take Java Key/Cert Object in addition to File
Xiaoyu Yao created RATIS-669: Summary: Allow Ratis gRPCTlsConfig to take Java Key/Cert Object in addition to File Key: RATIS-669 URL: https://issues.apache.org/jira/browse/RATIS-669 Project: Ratis Issue Type: Improvement Affects Versions: 0.3.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao This is needed for TLS client that does not have its own local persistence of cert file. CA cert will be decoded from block token for client external to ozone cluser (non SCM/OM/DN). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-543) Ratis GRPC client produces excessive logging while writing data.
[ https://issues.apache.org/jira/browse/RATIS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917292#comment-16917292 ] Hadoop QA commented on RATIS-543: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 1s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.examples.filestore.TestFileStoreWithGrpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-08-27 | | JIRA Issue | RATIS-543 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978715/r485_20190827.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 91b564800372 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 021165f | | maven | version: Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/946/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/946/testReport/ | | Max. process+thread count | 2200 (vs. ulimit of 5000) | | modules | C: ratis-grpc U: ratis-grpc | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/946/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Ratis GRPC client produces excessive logging while writing data. > > > Key: RATIS-543 > URL: https://issues.apache.org/jira/browse/RATIS-543 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Aravindan Vijayan >Assignee: Tsz Wo Nicholas Sze >Priority: Blocker > Labels: ozone > Attachments: r485_20190827.patch > > > {code} > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: >
[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat
[ https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917274#comment-16917274 ] Hadoop QA commented on RATIS-651: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 48s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.server.simulation.TestRaftStateMachineExceptionWithSimulatedRpc | | | ratis.examples.filestore.TestFileStoreWithNetty | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-08-27 | | JIRA Issue | RATIS-651 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978708/RATIS-651-003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 53af9d7639d0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 021165f | | maven | version: Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/945/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/945/testReport/ | | Max. process+thread count | 2982 (vs. ulimit of 5000) | | modules | C: ratis-server ratis-test U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/945/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add metrics related to leaderElection and HeartBeat > --- > > Key: RATIS-651 > URL: https://issues.apache.org/jira/browse/RATIS-651 > Project: Ratis > Issue Type: Sub-task > Components: server >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >
[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes
[ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917139#comment-16917139 ] Rajeshbabu Chintaguntla commented on RATIS-556: --- [~an...@apache.org] bq.can we please do this small change: instead of throwing an exception and catching outside, can we just WARN here itself and continue processing other logs(as to avoid im.mediate retry of the same log and in case if it continues to fail, then other logs will not ever be tried for close). Done in v3 patch. > Detect node failures and close the log to prevent additional writes > --- > > Key: RATIS-556 > URL: https://issues.apache.org/jira/browse/RATIS-556 > Project: Ratis > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, > RATIS-556_v2.patch, RATIS-556_v3.patch > > > Currently there is no way to detect the node failures at master log servers > and add new nodes to the group serving the log. We need to analyze how Ozone > is working in this case. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (RATIS-556) Detect node failures and close the log to prevent additional writes
[ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajeshbabu Chintaguntla updated RATIS-556: -- Attachment: RATIS-556_v3.patch > Detect node failures and close the log to prevent additional writes > --- > > Key: RATIS-556 > URL: https://issues.apache.org/jira/browse/RATIS-556 > Project: Ratis > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, > RATIS-556_v2.patch, RATIS-556_v3.patch > > > Currently there is no way to detect the node failures at master log servers > and add new nodes to the group serving the log. We need to analyze how Ozone > is working in this case. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes
[ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917105#comment-16917105 ] Ankit Singhal commented on RATIS-556: - Thanks [~rajeshbabu] , v2 looks good to me. can we please do this small change: instead of throwing an exception and catching outside, can we just WARN here itself and continue processing other logs(as to avoid immediate retry of the same log and in case if it continues to fail, then other logs will not ever be tried for close). {code} +try { +RaftClientReply reply = client.send( +() -> LogServiceProtoUtil.toChangeStateRequestProto(logName, LogStream.State.CLOSED) +.toByteString()); +LogServiceProtos.ChangeStateReplyProto message = + LogServiceProtos.ChangeStateReplyProto.parseFrom(reply.getMessage().getContent()); +} catch (IOException e) { +throw new RuntimeException(e); +} {code} > Detect node failures and close the log to prevent additional writes > --- > > Key: RATIS-556 > URL: https://issues.apache.org/jira/browse/RATIS-556 > Project: Ratis > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, > RATIS-556_v2.patch > > > Currently there is no way to detect the node failures at master log servers > and add new nodes to the group serving the log. We need to analyze how Ozone > is working in this case. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes
[ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917083#comment-16917083 ] Rajeshbabu Chintaguntla commented on RATIS-556: --- [~elserj] [~an...@apache.org] uploaded the patch adding an inverted index to map peer to logs and closing the logs when the peer is down. > Detect node failures and close the log to prevent additional writes > --- > > Key: RATIS-556 > URL: https://issues.apache.org/jira/browse/RATIS-556 > Project: Ratis > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, > RATIS-556_v2.patch > > > Currently there is no way to detect the node failures at master log servers > and add new nodes to the group serving the log. We need to analyze how Ozone > is working in this case. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (RATIS-556) Detect node failures and close the log to prevent additional writes
[ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajeshbabu Chintaguntla updated RATIS-556: -- Attachment: RATIS-556_v2.patch > Detect node failures and close the log to prevent additional writes > --- > > Key: RATIS-556 > URL: https://issues.apache.org/jira/browse/RATIS-556 > Project: Ratis > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, > RATIS-556_v2.patch > > > Currently there is no way to detect the node failures at master log servers > and add new nodes to the group serving the log. We need to analyze how Ozone > is working in this case. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917069#comment-16917069 ] Tsz Wo Nicholas Sze commented on RATIS-661: --- > Since the impl is removed earlier, RaftServer#getGroupIds would not give the >corresponding groupId ... When the group is being removed, it is correct to have RaftServer#getGroupIds not returning that id. Ozone datanode could use notifyGroupRemove() to check when the server impl is shutdown. If the group is not removed from the map in the beginning, new calls including client requests and another groupRemoveAsync(..) call can happen. It will have race condition. > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch, RATIS-661.004.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917067#comment-16917067 ] Hadoop QA commented on RATIS-661: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 27s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.netty.TestLogAppenderWithNetty | | | ratis.server.simulation.TestRaftStateMachineExceptionWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-08-27 | | JIRA Issue | RATIS-661 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978677/RATIS-661.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 2a046e678ea8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 021165f | | maven | version: Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/944/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/944/testReport/ | | Max. process+thread count | 4208 (vs. ulimit of 5000) | | modules | C: ratis-server ratis-test U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/944/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments:
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917059#comment-16917059 ] Lokesh Jain commented on RATIS-661: --- [~szetszwo] Thanks for reviewing the patch! | Why changing remove(..) to get(..) below? Since the impl is removed earlier, RaftServer#getGroupIds would not give the corresponding groupId even though the group has not yet been removed. RaftServer#getGroupIds is used in ozone datanode to know if pipeline exists or not. This can lead to race condition as pipeline may still be active even though it is reported as non-existent. | Just make the call there as below. I was thinking of keeping it this way so that we notify after all the transactions have been applied. {code:java} impl.shutdown(deleteDirectory); impl.getStateMachine().notifyGroupRemove();{code} > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch, RATIS-661.004.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Moved] (RATIS-668) Fix NOTICE file
[ https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal moved HDDS-2046 to RATIS-668: --- Key: RATIS-668 (was: HDDS-2046) Target Version/s: 0.4.0 (was: 0.4.1) Affects Version/s: (was: 0.4.1) 0.4.0 Workflow: no-reopen-closed, patch-avail (was: patch-available, re-open possible) Project: Ratis (was: Hadoop Distributed Data Store) > Fix NOTICE file > --- > > Key: RATIS-668 > URL: https://issues.apache.org/jira/browse/RATIS-668 > Project: Ratis > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Blocker > > NOTICE file needs to be updated based on Justin's comments here: > > [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E] > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916965#comment-16916965 ] Tsz Wo Nicholas Sze commented on RATIS-661: --- [~ljain], thanks for working on this. - Why changing remove(..) to get(..) below? It could have a race condition when there are multiple groupRemoveAsync(..) calls. {code} } -final CompletableFuture f = impls.remove(groupId); +final CompletableFuture f = impls.get(groupId); if (f == null) { {code} - Let's call the new method notifyGroupRemove() in StateMachine. - Let's do not change shutdown(..) since the groupRemoval parameter is always false except for groupRemoveAsync(..). Just make the call there as below. {code} @@ -403,6 +403,7 @@ public class RaftServerProxy implements RaftServer { } return f.thenApply(impl -> { final Collection commitInfos = impl.getCommitInfos(); + impl.getStateMachine().notifyGroupRemove(); impl.shutdown(deleteDirectory); return new RaftClientReply(request, commitInfos); }); {code} > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch, RATIS-661.004.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (RATIS-543) Ratis GRPC client produces excessive logging while writing data.
[ https://issues.apache.org/jira/browse/RATIS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reassigned RATIS-543: - Assignee: Tsz Wo Nicholas Sze > Ratis GRPC client produces excessive logging while writing data. > > > Key: RATIS-543 > URL: https://issues.apache.org/jira/browse/RATIS-543 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Aravindan Vijayan >Assignee: Tsz Wo Nicholas Sze >Priority: Blocker > Labels: ozone > Attachments: r485_20190827.patch > > > {code} > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1352, SUCCESS, logIndex=15195, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1355, SUCCESS, logIndex=15196, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1357, SUCCESS, logIndex=15197, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6: receive > RaftClientReply:client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6@group-AE803AF42C5D, > cid=1370, SUCCESS, logIndex=0, com > mits[5a076d87-abf9-4ade-ae37-adab741d99a6:c16423, > 6e21905d-9796-4248-834e-ed97ea6763ef:c16422, > 34e8d6e5-456f-4e2a-99a5-4f21fd9c4a7e:c16423] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836: receive > RaftClientReply:client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836@group-4E41299EA191, > cid=1376, SUCCESS, logIndex=0, com > mits[a5729949-67f1-496e-a0d3-1bfc0e139836:c4764, > 111d4c23-756f-4c8a-a48d-aa2a327a5179:c4764, > 287eccfb-8461-419a-8732-529d042380b3:c4764] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb: receive > RaftClientReply:client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb@group-D1BB7F32F754, > cid=1382, FAILED org.apache.ratis. > protocol.NotLeaderException: Server 0bb45975-b0d2-499e-85cc-22ea22c57ecb is > not the leader (f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:10.17.200.18:9858). > Request must be sent to leader., logIndex=0, > commits[0bb45975-b0d2-499e-85cc-22ea22c57ecb:c15358, 6c7a > 780f-5474-49da-b880-3eaf69d9d83d:c15358, > f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:c15358] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1359, SUCCESS, logIndex=15208, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1362, SUCCESS, logIndex=15209, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1363, SUCCESS, logIndex=15210, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:32 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1371, SUCCESS, logIndex=15211, >
[jira] [Updated] (RATIS-543) Ratis GRPC client produces excessive logging while writing data.
[ https://issues.apache.org/jira/browse/RATIS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-543: -- Component/s: gRPC r543_20190827.patch: change the log to trace. > Ratis GRPC client produces excessive logging while writing data. > > > Key: RATIS-543 > URL: https://issues.apache.org/jira/browse/RATIS-543 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Aravindan Vijayan >Priority: Blocker > Labels: ozone > Attachments: r485_20190827.patch > > > {code} > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1352, SUCCESS, logIndex=15195, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1355, SUCCESS, logIndex=15196, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1357, SUCCESS, logIndex=15197, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6: receive > RaftClientReply:client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6@group-AE803AF42C5D, > cid=1370, SUCCESS, logIndex=0, com > mits[5a076d87-abf9-4ade-ae37-adab741d99a6:c16423, > 6e21905d-9796-4248-834e-ed97ea6763ef:c16422, > 34e8d6e5-456f-4e2a-99a5-4f21fd9c4a7e:c16423] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836: receive > RaftClientReply:client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836@group-4E41299EA191, > cid=1376, SUCCESS, logIndex=0, com > mits[a5729949-67f1-496e-a0d3-1bfc0e139836:c4764, > 111d4c23-756f-4c8a-a48d-aa2a327a5179:c4764, > 287eccfb-8461-419a-8732-529d042380b3:c4764] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb: receive > RaftClientReply:client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb@group-D1BB7F32F754, > cid=1382, FAILED org.apache.ratis. > protocol.NotLeaderException: Server 0bb45975-b0d2-499e-85cc-22ea22c57ecb is > not the leader (f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:10.17.200.18:9858). > Request must be sent to leader., logIndex=0, > commits[0bb45975-b0d2-499e-85cc-22ea22c57ecb:c15358, 6c7a > 780f-5474-49da-b880-3eaf69d9d83d:c15358, > f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:c15358] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1359, SUCCESS, logIndex=15208, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1362, SUCCESS, logIndex=15209, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1363, SUCCESS, logIndex=15210, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:32 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1371, SUCCESS, logIndex=15211, >
[jira] [Updated] (RATIS-543) Ratis GRPC client produces excessive logging while writing data.
[ https://issues.apache.org/jira/browse/RATIS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-543: -- Attachment: r485_20190827.patch > Ratis GRPC client produces excessive logging while writing data. > > > Key: RATIS-543 > URL: https://issues.apache.org/jira/browse/RATIS-543 > Project: Ratis > Issue Type: Bug >Reporter: Aravindan Vijayan >Priority: Blocker > Labels: ozone > Attachments: r485_20190827.patch > > > {code} > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1352, SUCCESS, logIndex=15195, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1355, SUCCESS, logIndex=15196, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1357, SUCCESS, logIndex=15197, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15201, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15189, > aaf673a3-95ac-43aa-8614-b1a324142430:c15186] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6: receive > RaftClientReply:client-C46A037579AA->5a076d87-abf9-4ade-ae37-adab741d99a6@group-AE803AF42C5D, > cid=1370, SUCCESS, logIndex=0, com > mits[5a076d87-abf9-4ade-ae37-adab741d99a6:c16423, > 6e21905d-9796-4248-834e-ed97ea6763ef:c16422, > 34e8d6e5-456f-4e2a-99a5-4f21fd9c4a7e:c16423] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836: receive > RaftClientReply:client-EBF618C3F968->a5729949-67f1-496e-a0d3-1bfc0e139836@group-4E41299EA191, > cid=1376, SUCCESS, logIndex=0, com > mits[a5729949-67f1-496e-a0d3-1bfc0e139836:c4764, > 111d4c23-756f-4c8a-a48d-aa2a327a5179:c4764, > 287eccfb-8461-419a-8732-529d042380b3:c4764] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb: receive > RaftClientReply:client-4D5E3CDC8889->0bb45975-b0d2-499e-85cc-22ea22c57ecb@group-D1BB7F32F754, > cid=1382, FAILED org.apache.ratis. > protocol.NotLeaderException: Server 0bb45975-b0d2-499e-85cc-22ea22c57ecb is > not the leader (f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:10.17.200.18:9858). > Request must be sent to leader., logIndex=0, > commits[0bb45975-b0d2-499e-85cc-22ea22c57ecb:c15358, 6c7a > 780f-5474-49da-b880-3eaf69d9d83d:c15358, > f1a756c3-6b42-4ece-8093-dbcdac5f8d5b:c15358] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1359, SUCCESS, logIndex=15208, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1362, SUCCESS, logIndex=15209, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:31 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1363, SUCCESS, logIndex=15210, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15210, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, > aaf673a3-95ac-43aa-8614-b1a324142430:c15189] > 19/05/03 10:23:32 INFO client.GrpcClientProtocolClient: > client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da: receive > RaftClientReply:client-FD23551CACEE->51711703-9f9d-4c79-bfb1-38726f0059da@group-1EADCA052664, > cid=1371, SUCCESS, logIndex=15211, > commits[51711703-9f9d-4c79-bfb1-38726f0059da:c15211, > 0beac0f1-af74-43ac-ba73-0a92ecb9f0ae:c15201, >
[jira] [Commented] (RATIS-485) Load Generator OOMs if Ratis Unavailable
[ https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916920#comment-16916920 ] Tsz Wo Nicholas Sze commented on RATIS-485: --- Is the test creating a lot of RaftClient(s)? Each client has a TimeoutScheduler which may cause the OOM. Let's make the scheduler static to see if it could fix the OOM: r485_20190827.patch > Load Generator OOMs if Ratis Unavailable > > > Key: RATIS-485 > URL: https://issues.apache.org/jira/browse/RATIS-485 > Project: Ratis > Issue Type: Bug > Components: examples >Reporter: Clay B. >Priority: Trivial > Attachments: loadgen.log, r485_20190827.patch > > > Running the load generator without a Ratis cluster (e.g. spurious node IPs) > results in an OOM. > If one has a single Ratis server it tries seemingly indefinitely: > {code:java} > vagrant@ratis-server:~/incubator-ratis$ > ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 > --numFiles 100 --peers n0:127.0.0.1:1{code} > If one has two Ratis servers it OOMs: > {code:java} > vagrant@ratis-server:~/incubator-ratis$ > ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 > --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2 > [...] > 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: > java.io.IOException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new > leader: null. Failed > RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > with java.io.IOException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader > from n1 to n0 > 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with > policy RetryForeverNoSleep for > RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* > RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed > RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: > unable to create new native thread > Exception in thread "main" java.util.concurrent.CompletionException: > java.lang.OutOfMemoryError: unable to create new native thread > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884) > at > java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) > at > org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >
[jira] [Updated] (RATIS-485) Load Generator OOMs if Ratis Unavailable
[ https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-485: -- Attachment: r485_20190827.patch > Load Generator OOMs if Ratis Unavailable > > > Key: RATIS-485 > URL: https://issues.apache.org/jira/browse/RATIS-485 > Project: Ratis > Issue Type: Bug > Components: examples >Reporter: Clay B. >Priority: Trivial > Attachments: loadgen.log, r485_20190827.patch > > > Running the load generator without a Ratis cluster (e.g. spurious node IPs) > results in an OOM. > If one has a single Ratis server it tries seemingly indefinitely: > {code:java} > vagrant@ratis-server:~/incubator-ratis$ > ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 > --numFiles 100 --peers n0:127.0.0.1:1{code} > If one has two Ratis servers it OOMs: > {code:java} > vagrant@ratis-server:~/incubator-ratis$ > ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 > --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2 > [...] > 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: > java.io.IOException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new > leader: null. Failed > RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > with java.io.IOException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader > from n1 to n0 > 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with > policy RetryForeverNoSleep for > RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* > RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed > RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 > RW, > org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0 > with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: > unable to create new native thread > Exception in thread "main" java.util.concurrent.CompletionException: > java.lang.OutOfMemoryError: unable to create new native thread > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884) > at > java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) > at > org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at
[jira] [Updated] (RATIS-651) Add metrics related to leaderElection and HeartBeat
[ https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated RATIS-651: Attachment: RATIS-651-003.patch > Add metrics related to leaderElection and HeartBeat > --- > > Key: RATIS-651 > URL: https://issues.apache.org/jira/browse/RATIS-651 > Project: Ratis > Issue Type: Sub-task > Components: server >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Aravindan Vijayan >Priority: Critical > Attachments: RATIS-651-000.patch, RATIS-651-001.patch, > RATIS-651-002.patch, RATIS-651-003.patch > > > Following metrics would be helpful to determine the leader election events > and timeouts: > > |numLeaderElections|Number of leader elections since the creation of ratis > pipeline| > |numLeaderElectionTimeouts|Number of leader election timeouts or failures| > |LeaderElectionCompletionLatency|Time required to complete a leader election| > |MaxNoLeaderInterval|Max time where there has been no elected leader in the > raft ring| > |heartBeatMissCount|No of times heartBeat response is missed from a server | -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat
[ https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916900#comment-16916900 ] Hadoop QA commented on RATIS-651: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 51s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.netty.TestRaftSnapshotWithNetty | | | ratis.netty.TestRaftStateMachineExceptionWithNetty | | | ratis.grpc.TestRaftSnapshotWithGrpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-08-27 | | JIRA Issue | RATIS-651 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978641/RATIS-651-002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 620c4f44fc38 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 021165f | | maven | version: Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T18:41:47Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/943/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/943/testReport/ | | Max. process+thread count | 2030 (vs. ulimit of 5000) | | modules | C: ratis-server ratis-test U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/943/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add metrics related to leaderElection and HeartBeat > --- > > Key: RATIS-651 > URL: https://issues.apache.org/jira/browse/RATIS-651 > Project: Ratis > Issue Type: Sub-task > Components: server >Affects Versions: 0.4.0 >Reporter: Shashikant
[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat
[ https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916886#comment-16916886 ] Shashikant Banerjee commented on RATIS-651: --- Thanks [~avijayan] for working on this. The patch looks overall good to me . Can we just initialize and aggregate the heartBeatMetrics in LeaderState instead of LogAppender class? > Add metrics related to leaderElection and HeartBeat > --- > > Key: RATIS-651 > URL: https://issues.apache.org/jira/browse/RATIS-651 > Project: Ratis > Issue Type: Sub-task > Components: server >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Aravindan Vijayan >Priority: Critical > Attachments: RATIS-651-000.patch, RATIS-651-001.patch, > RATIS-651-002.patch > > > Following metrics would be helpful to determine the leader election events > and timeouts: > > |numLeaderElections|Number of leader elections since the creation of ratis > pipeline| > |numLeaderElectionTimeouts|Number of leader election timeouts or failures| > |LeaderElectionCompletionLatency|Time required to complete a leader election| > |MaxNoLeaderInterval|Max time where there has been no elected leader in the > raft ring| > |heartBeatMissCount|No of times heartBeat response is missed from a server | -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated RATIS-661: -- Attachment: RATIS-661.004.patch > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch, RATIS-661.004.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916572#comment-16916572 ] Lokesh Jain commented on RATIS-661: --- [~msingh] Thanks for reviewing the patch! v4 patch addresses your comments. > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch, RATIS-661.004.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal
[ https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916540#comment-16916540 ] Mukul Kumar Singh commented on RATIS-661: - Thanks for working on this [~ljain]. The patch generally looks good to me. Can we add this handleGroupRemove call in RaftServerImpl in shutdown after all the transactions have been applied and before deleting the directory. > Add call in state machine to handle group removal > - > > Key: RATIS-661 > URL: https://issues.apache.org/jira/browse/RATIS-661 > Project: Ratis > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-661.001.patch, RATIS-661.002.patch, > RATIS-661.003.patch > > > Currently during RaftServerProxy#groupRemoveAsync there is no way for > stateMachine to know that the RaftGroup will be removed. This Jira aims to > add a call in the stateMachine to handle group removal. > It also changes the logic of groupRemoval api to remove the RaftServerImpl > from the RaftServerProxy#impls map after the shutdown is complete. This is > required to synchronize the removal with the corresponding api of > RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls > map to get the groupIds. -- This message was sent by Atlassian Jira (v8.3.2#803003)