[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576699#comment-16576699 ] Tsz Wo Nicholas Sze commented on RATIS-270: --- > 1) RaftServerImpl:1069, lets rename updateCache to isReplyDelayed, I feel > that this will help with the comments as well, ... For follower, there is no reply so that I will keep the name "updateCache". Sure, let's add more comments. Will address #2 and #3 in the next patch. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180803.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-290: Attachment: (was: RATIS-290.002.patch) > Raft server should notify the state machine if no leader is assigned for a > long time > > > Key: RATIS-290 > URL: https://issues.apache.org/jira/browse/RATIS-290 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > > In ratis a raft server can be in 3 state, Candidate, Leader and Follower. Out > of these state, in a cluster, one node being a leader and others being the > follower is the steady system state. This jira proposes to add a new api to > identify if a node is left aside because of network partition and is without > a leader. > Once it is detected that a node has been in candidate for a sufficiently long > time, then a callback to the state machine will be triggered to handle this > partitioned node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576645#comment-16576645 ] Tsz Wo Nicholas Sze commented on RATIS-290: --- - We should not add RaftServerImpl.group since the member may change. The peers should be retrieved from the conf as below. {code:java} -final RaftGroup group = new RaftGroup(groupId, getRaftConf().getPeers()); {code} - lastNoLeaderTime should be declared as {code} volatile Timestamp lastNoLeaderTime; {code} -* Then, use a local variable in getLastLeaderElapsedTimeMs() (let's add "Ms" to the method name). {code} long getLastLeaderElapsedTimeMs() { final Timestamp t = lastNoLeaderTime; return t == null ? 0 : t.elapsedTimeMs(); } {code} The getLastLeaderElapsedTime() implementation in the patch has a problem that lastNoLeaderTime.get() may return different values due to a race condition. > Raft server should notify the state machine if no leader is assigned for a > long time > > > Key: RATIS-290 > URL: https://issues.apache.org/jira/browse/RATIS-290 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > Attachments: RATIS-290.002.patch > > > In ratis a raft server can be in 3 state, Candidate, Leader and Follower. Out > of these state, in a cluster, one node being a leader and others being the > follower is the steady system state. This jira proposes to add a new api to > identify if a node is left aside because of network partition and is without > a leader. > Once it is detected that a node has been in candidate for a sufficiently long > time, then a callback to the state machine will be triggered to handle this > partitioned node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576735#comment-16576735 ] Mukul Kumar Singh commented on RATIS-298: - Thanks for the review [~szetszwo]. I have tested this with Ozone using local builds and that is working fine. > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: build >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto.service:auto-service:1.0-rc4 > +-com.google.auto:auto-common:0.8 > > [WARNING] > Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 > paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.6.2 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.11.0 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-core:2.11.0 > +-org.apache.logging.log4j:log4j-api:2.11.0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-290: Attachment: RATIS-290.002.patch > Raft server should notify the state machine if no leader is assigned for a > long time > > > Key: RATIS-290 > URL: https://issues.apache.org/jira/browse/RATIS-290 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > > In ratis a raft server can be in 3 state, Candidate, Leader and Follower. Out > of these state, in a cluster, one node being a leader and others being the > follower is the steady system state. This jira proposes to add a new api to > identify if a node is left aside because of network partition and is without > a leader. > Once it is detected that a node has been in candidate for a sufficiently long > time, then a callback to the state machine will be triggered to handle this > partitioned node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-290: Attachment: (was: RATIS-290.002.patch) > Raft server should notify the state machine if no leader is assigned for a > long time > > > Key: RATIS-290 > URL: https://issues.apache.org/jira/browse/RATIS-290 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > > In ratis a raft server can be in 3 state, Candidate, Leader and Follower. Out > of these state, in a cluster, one node being a leader and others being the > follower is the steady system state. This jira proposes to add a new api to > identify if a node is left aside because of network partition and is without > a leader. > Once it is detected that a node has been in candidate for a sufficiently long > time, then a callback to the state machine will be triggered to handle this > partitioned node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576845#comment-16576845 ] Tsz Wo Nicholas Sze commented on RATIS-260: --- {quote} In this test, one of the nodes was shut down permanently. This can result into a situation where a candidate node is never able to move out of Leader Election phase. {quote} I just have checked the current code again. I cannot see how this could happen. I suspect that the candidate node cannot talk to the other nodes in this failure case so that it won't able to move out from Leader Election. > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} >
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576703#comment-16576703 ] Tsz Wo Nicholas Sze commented on RATIS-270: --- r270_20180810.patch: addresses [~msingh]'s comments. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-270: -- Attachment: (was: r270_20180803.patch) > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-270: -- Attachment: r270_20180810.patch > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-298: Attachment: RATIS-298.001.patch > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576783#comment-16576783 ] Hadoop QA commented on RATIS-290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 4s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} root: The patch generated 25 new + 791 unchanged - 3 fixed = 816 total (was 794) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 50s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.TestRaftServerSlownessDetection | | | ratis.server.simulation.TestRaftWithSimulatedRpc | | | ratis.TestRaftServerLeaderElectionTimeout | | | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 | | JIRA Issue | RATIS-290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935180/RATIS-290.003.patch | | Optional Tests | asflicense cc unit javac javadoc findbugs checkstyle compile | | uname | Linux e4612129386a 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 6a2c3d5 | | Default Java | 1.8.0_181 | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/292/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/292/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/292/testReport/ | | modules | C: ratis-proto-shaded ratis-server U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/292/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Raft server should notify the state machine if no leader is assigned for a > long time >
[jira] [Commented] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576755#comment-16576755 ] Hadoop QA commented on RATIS-298: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 48s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 6s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.server.TestRaftLogMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 | | JIRA Issue | RATIS-298 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935175/RATIS-298.001.patch | | Optional Tests | asflicense javac javadoc unit xml compile | | uname | Linux d9f41987c435 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 6a2c3d5 | | Default Java | 1.8.0_171 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/290/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/290/testReport/ | | modules | C: ratis-proto-shaded U: ratis-proto-shaded | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/290/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: build >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT >
[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576770#comment-16576770 ] Shashikant Banerjee commented on RATIS-260: --- Thanks [~szetszwo], for the review. The issue is not recreatable consistently with Ozone. As discussed with [~msingh], it was hit after 50 runs of Freon in cluster once. I ran basic Freon in Ozone and it worked well. > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} > This happens because of the following lines of the code during requestVote. > {code} > for (final RaftPeer peer : others) { > final RequestVoteRequestProto r =
[jira] [Commented] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576730#comment-16576730 ] Tsz Wo Nicholas Sze commented on RATIS-298: --- +1 patch looks good. Have you tested it with Ozone? > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: build >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto.service:auto-service:1.0-rc4 > +-com.google.auto:auto-common:0.8 > > [WARNING] > Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 > paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.6.2 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.11.0 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-core:2.11.0 > +-org.apache.logging.log4j:log4j-api:2.11.0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-298: -- Component/s: (was: server) build > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: build >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto.service:auto-service:1.0-rc4 > +-com.google.auto:auto-common:0.8 > > [WARNING] > Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 > paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.6.2 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.11.0 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-core:2.11.0 > +-org.apache.logging.log4j:log4j-api:2.11.0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576729#comment-16576729 ] Hadoop QA commented on RATIS-270: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} root: The patch generated 50 new + 613 unchanged - 23 fixed = 663 total (was 636) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 56s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 6s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.server.simulation.TestRetryCacheWithSimulatedRpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 | | JIRA Issue | RATIS-270 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935171/r270_20180810.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux e43f17295865 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 6a2c3d5 | | Default Java | 1.8.0_171 | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/289/artifact/out/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/289/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/289/testReport/ | | modules | C: ratis-server U: ratis-server | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/289/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576739#comment-16576739 ] Tsz Wo Nicholas Sze commented on RATIS-270: --- TestRetryCacheWithSimulatedRpc failed since it does not support async. Will upload a new patch. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-290) Raft server should notify the state machine if no leader is assigned for a long time
[ https://issues.apache.org/jira/browse/RATIS-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-290: Attachment: RATIS-290.003.patch > Raft server should notify the state machine if no leader is assigned for a > long time > > > Key: RATIS-290 > URL: https://issues.apache.org/jira/browse/RATIS-290 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > Attachments: RATIS-290.003.patch > > > In ratis a raft server can be in 3 state, Candidate, Leader and Follower. Out > of these state, in a cluster, one node being a leader and others being the > follower is the steady system state. This jira proposes to add a new api to > identify if a node is left aside because of network partition and is without > a leader. > Once it is detected that a node has been in candidate for a sufficiently long > time, then a callback to the state machine will be triggered to handle this > partitioned node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576699#comment-16576699 ] Tsz Wo Nicholas Sze edited comment on RATIS-270 at 8/10/18 6:33 PM: > 1) RaftServerImpl:1069, lets rename updateCache to isReplyDelayed, I feel > that this will help with the comments as well, ... For follower, there is no reply so that I will keep the name "updateCache". Sure, let's add more comments. Will also address #2 and #3 in the next patch. Thanks Mukul. was (Author: szetszwo): > 1) RaftServerImpl:1069, lets rename updateCache to isReplyDelayed, I feel > that this will help with the comments as well, ... For follower, there is no reply so that I will keep the name "updateCache". Sure, let's add more comments. Will address #2 and #3 in the next patch. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180803.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576727#comment-16576727 ] Tsz Wo Nicholas Sze commented on RATIS-260: --- +1 patch looks good. [~shashikant], have tested it with Ozone to see if this can fix the problem? > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} > This happens because of the following lines of the code during requestVote. > {code} > for (final RaftPeer peer : others) { > final RequestVoteRequestProto r = server.createRequestVoteRequest( > peer.getId(), electionTerm, lastEntry); > service.submit( > () ->
[jira] [Updated] (RATIS-298) Update auto-common and log4j versions in ratis
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-298: Description: Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. As this causes compliation issues in Ozone as following {code} [WARNING] Dependency convergence error for com.google.auto:auto-common:0.10 paths to dependency are: +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT +-com.google.auto:auto-common:0.10 and +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT +-com.google.auto.service:auto-service:1.0-rc4 +-com.google.auto:auto-common:0.8 [WARNING] Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 paths to dependency are: +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT +-org.apache.logging.log4j:log4j-api:2.6.2 and +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT +-org.apache.logging.log4j:log4j-api:2.11.0 and +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT +-org.apache.logging.log4j:log4j-core:2.11.0 +-org.apache.logging.log4j:log4j-api:2.11.0 {code} was:Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis, > Update auto-common and log4j versions in ratis > -- > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto.service:auto-service:1.0-rc4 > +-com.google.auto:auto-common:0.8 > > [WARNING] > Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 > paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.6.2 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.11.0 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-core:2.11.0 > +-org.apache.logging.log4j:log4j-api:2.11.0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576778#comment-16576778 ] Hadoop QA commented on RATIS-270: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 5s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} root: The patch generated 50 new + 613 unchanged - 23 fixed = 663 total (was 636) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 57s{color} | {color:green} root in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 | | JIRA Issue | RATIS-270 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935176/r270_20180810b.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 57daf79ad248 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 6a2c3d5 | | Default Java | 1.8.0_171 | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/291/artifact/out/diff-checkstyle-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/291/testReport/ | | modules | C: ratis-server ratis-grpc U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/291/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810b.patch > > > Retry requests are answered from the retry cache when requests have >
[jira] [Created] (RATIS-298) Update auto-common and log4j versions in ratis
Mukul Kumar Singh created RATIS-298: --- Summary: Update auto-common and log4j versions in ratis Key: RATIS-298 URL: https://issues.apache.org/jira/browse/RATIS-298 Project: Ratis Issue Type: Bug Components: server Affects Versions: 0.3.0 Reporter: Mukul Kumar Singh Assignee: Mukul Kumar Singh Fix For: 0.3.0 Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-298) Update auto-common and log4j versions
[ https://issues.apache.org/jira/browse/RATIS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-298: Summary: Update auto-common and log4j versions (was: Update auto-common and log4j versions in ratis) > Update auto-common and log4j versions > - > > Key: RATIS-298 > URL: https://issues.apache.org/jira/browse/RATIS-298 > Project: Ratis > Issue Type: Bug > Components: build >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.3.0 > > Attachments: RATIS-298.001.patch > > > Update auto-common to 0.8 and log4j to 2.11.0 versions in ratis. > As this causes compliation issues in Ozone as following > {code} > [WARNING] > Dependency convergence error for com.google.auto:auto-common:0.10 paths to > dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto:auto-common:0.10 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-com.google.auto.service:auto-service:1.0-rc4 > +-com.google.auto:auto-common:0.8 > > [WARNING] > Dependency convergence error for org.apache.logging.log4j:log4j-api:2.6.2 > paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.ratis:ratis-server:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.ratis:ratis-proto-shaded:0.3.0-6a2c3d5-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.6.2 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-api:2.11.0 > and > +-org.apache.hadoop:hadoop-hdds-common:0.2.1-SNAPSHOT > +-org.apache.logging.log4j:log4j-core:2.11.0 > +-org.apache.logging.log4j:log4j-api:2.11.0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-260: Attachment: hadoop-hdfs-datanode-y128.log > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch, hadoop-hdfs-datanode-y130.log > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} > This happens because of the following lines of the code during requestVote. > {code} > for (final RaftPeer peer : others) { > final RequestVoteRequestProto r = server.createRequestVoteRequest( > peer.getId(), electionTerm, lastEntry); > service.submit( > () -> server.getServerRpc().requestVote(r)); > submitted++; > } > {code} --
[jira] [Updated] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated RATIS-260: Attachment: hadoop-hdfs-datanode-y130.log > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch, hadoop-hdfs-datanode-y130.log > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} > This happens because of the following lines of the code during requestVote. > {code} > for (final RaftPeer peer : others) { > final RequestVoteRequestProto r = server.createRequestVoteRequest( > peer.getId(), electionTerm, lastEntry); > service.submit( > () -> server.getServerRpc().requestVote(r)); > submitted++; > } > {code} --
[jira] [Updated] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-270: -- Attachment: (was: r270_20180810.patch) > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810b.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576727#comment-16576727 ] Tsz Wo Nicholas Sze edited comment on RATIS-260 at 8/10/18 7:10 PM: +1 patch looks good. [~shashikant], have you tested it with Ozone to see if this can fix the problem? was (Author: szetszwo): +1 patch looks good. [~shashikant], have tested it with Ozone to see if this can fix the problem? > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by: java.net.ConnectException: Connection refused > ... 11 more > {code} > This happens because of the following lines of the code during requestVote. > {code} > for (final RaftPeer peer : others) { >
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576745#comment-16576745 ] Tsz Wo Nicholas Sze commented on RATIS-270: --- r270_20180810b.patch: moves the new test to TestRetryCacheWithGrpc. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810b.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated RATIS-270: -- Attachment: r270_20180810b.patch > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180810b.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576842#comment-16576842 ] Tsz Wo Nicholas Sze commented on RATIS-260: --- {quote} No, it is a bug in LeaderElection.waitForResults(LeaderElection.java:214) according to the given stack trace. {quote} Sorry [~shashikant]. My above comment was wrong. The stack trace indeed shows that the StatusRuntimeException is wrapped by an ExecutionException. Catching StatusRuntimeException seems not helpful. {code} java.util.concurrent.ExecutionException: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception {code} > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: > 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting > votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at > org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202) > at > org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131) > at > org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281) > at > org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61) > at > org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) > at > org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at > org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > ... 1 more > Caused by:
[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails
[ https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575979#comment-16575979 ] Hadoop QA commented on RATIS-260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} root: The patch generated 1 new + 50 unchanged - 1 fixed = 51 total (was 51) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 17s{color} | {color:green} root in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 | | JIRA Issue | RATIS-260 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934968/RATIS-260.00.patch | | Optional Tests | asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 02f4772880c3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 6a2c3d5 | | Default Java | 1.8.0_171 | | checkstyle | https://builds.apache.org/job/PreCommit-RATIS-Build/288/artifact/out/diff-checkstyle-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/288/testReport/ | | modules | C: ratis-server U: ratis-server | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/288/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Ratis Leader election should try for other peers even when ask for votes fails > -- > > Key: RATIS-260 > URL: https://issues.apache.org/jira/browse/RATIS-260 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Attachments: RATIS-260.00.patch > > > This bug was simulated using Ozone using Ratis for Data pipeline. > In this test, one of the nodes was shut down permanently. This can result > into a situation where a candidate node is never able to move out of Leader > Election phase. > {code} > 2018-06-15
[jira] [Assigned] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails
[ https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh reassigned RATIS-291: --- Assignee: Shashikant Banerjee (was: Mukul Kumar Singh) > Raft Server should fail themselves when a raft storage directory fails > -- > > Key: RATIS-291 > URL: https://issues.apache.org/jira/browse/RATIS-291 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone > Fix For: 0.3.0 > > > A Raft server uses a storage directory to store the write ahead log. If this > log is lost because of a reason, then this node should fail itself. > For a follower, if raft log location has failed, then the follower will not > be able to append any entries. This node will now be lagging behind the > follower and will eventually be notified via notifySlowness. > For a leader where the raft log disk has failed, the leader will not append > any new entries to its log. However with respect to the raft ring, the leader > will still remain healthy. This jira proposes to add a new api to identify a > leader with failed node. > Also this jira also proposes to add a new api to the statemachine, so that > state machine implementation can provide methods to verify the raft log > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RATIS-270) Replication ALL requests should not be replied from retry cache if they are delayed.
[ https://issues.apache.org/jira/browse/RATIS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576145#comment-16576145 ] Mukul Kumar Singh commented on RATIS-270: - Thanks for working on this [~szetszwo]. The patch looks good to me, Some minor comments as following. 1) RaftServerImpl:1069, lets rename updateCache to isReplyDelayed, I feel that this will help with the comments as well, also lets add a note that for delayed replies, the retryCache will be updated as part of RaftClientReply#getReply 2) RetryCacheTests.java:205, lets add a small note inside the fail statement. 3) Inside RetryCacheTests.java, when the first set of request fail, can we retry with another round of same requests being sent, and making sure that they are blocking on the failed node and not reading from the retry cache of the leader. > Replication ALL requests should not be replied from retry cache if they are > delayed. > > > Key: RATIS-270 > URL: https://issues.apache.org/jira/browse/RATIS-270 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: Mukul Kumar Singh >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: ozone > Attachments: r270_20180803.patch > > > Retry requests are answered from the retry cache when requests have > Replication_ALL semantics. This leads to a case, where the client retries for > a response which is stuck in the delayed replies queue. This new retry is now > answered from the retry cache even though the request has not been completed > on all the nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)