[jira] [Created] (RATIS-711) Add ability to specify higher request timeout in watch request
Shashikant Banerjee created RATIS-711: - Summary: Add ability to specify higher request timeout in watch request Key: RATIS-711 URL: https://issues.apache.org/jira/browse/RATIS-711 Project: Ratis Issue Type: Bug Reporter: Shashikant Banerjee Currently , a watch request from raft client times out by default in 3 sec . In certain conditions, it may be required to have a higher watch request timeout value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-710) GC pauses in leader should not penalize appendRequest response processing
Shashikant Banerjee created RATIS-710: - Summary: GC pauses in leader should not penalize appendRequest response processing Key: RATIS-710 URL: https://issues.apache.org/jira/browse/RATIS-710 Project: Ratis Issue Type: Bug Components: raft-group Reporter: Shashikant Banerjee Fix For: 0.5.0 In ozone perf testing, it was observed that once leader goes through gc pause cycle and wakes up, it just times out all append requests , but the follower seems to be processing the append requests fine. It goes in a loop and ends up failing the watch requests on the leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-709) RaftClient should not retry on a different leader on NotReplicated exception from leader
Shashikant Banerjee created RATIS-709: - Summary: RaftClient should not retry on a different leader on NotReplicated exception from leader Key: RATIS-709 URL: https://issues.apache.org/jira/browse/RATIS-709 Project: Ratis Issue Type: Bug Components: client Reporter: Shashikant Banerjee Fix For: 0.5.0 Currently, when a watch request times out with a NotReplicatedException on the leader raft client starts retrying the request on different server and starts failing with NotLeaderException and it goes in a loop. Ideally , when a watch request times out , it should be retried automatically by raft client. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created
[ https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949351#comment-16949351 ] Hadoop QA commented on RATIS-704: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 7s{color} | {color:red} root in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 1s{color} | {color:red} root in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 2s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-11 | | JIRA Issue | RATIS-704 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982756/r704_20191011.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 0db9214363c5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 3f446aa | | maven | version: Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) | | Default Java | 1.8.0_222 | | javadoc | https://builds.apache.org/job/PreCommit-RATIS-Build/1058/artifact/out/patch-javadoc-root.txt | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/1058/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/1058/testReport/ | | Max. process+thread count | 86 (vs. ulimit of 5000) | | modules | C: ratis-common ratis-client U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/1058/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Invoke sendAsync as soon as OrderedAsync is created > --- > > Key: RATIS-704 > URL: https://issues.apache.org/jira/browse/RATIS-704 > Project: Ratis > Issue Type: Improvement > Components: client >Reporter: Tsz-wo Sze
[jira] [Commented] (RATIS-707) Test failures caused by minTimeout set to zero
[ https://issues.apache.org/jira/browse/RATIS-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949342#comment-16949342 ] Tsz-wo Sze commented on RATIS-707: -- [~swagle], we should review the idea of RATIS-698. The idea is to not sleeping for the first leader election. However, in a distributed system, how could a server tells if the upcoming election is the first one? After RATIS-698, some servers may honor min timeout but some others may not. An existing leader with a majority of followers may be incorrectly forced to step down when a new server is joining the group since the new server is not honoring the min timeout. The new server, who is not honoring min timeout, won't give a chance to the old leader, who is honoring the min timeout, to send a heartbeat. > Test failures caused by minTimeout set to zero > -- > > Key: RATIS-707 > URL: https://issues.apache.org/jira/browse/RATIS-707 > Project: Ratis > Issue Type: Bug > Components: server >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: RATIS-707.01.patch > > > TestRaftAsyncWithGrpc#testBasicAppendEntriesAsync and other tests fail if the > initial minTimeout is 0 then the server can trigger a leader election much > more frequently because the heartbeat interval is still at minTimeoutMs/2 > {code} > 2019-10-11 00:45:47,813 INFO impl.FollowerState > (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change > to CANDIDATE, lastRpcTime:21ms, electionTimeout:17ms > 2019-10-11 00:45:47,870 INFO impl.FollowerState > (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change > to CANDIDATE, lastRpcTime:35ms, electionTimeout:31ms > 2019-10-11 00:45:47,933 INFO impl.FollowerState > (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change > to CANDIDATE, lastRpcTime:51ms, electionTimeout:51ms > 2019-10-11 00:45:47,969 INFO impl.FollowerState > (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change > to CANDIDATE, lastRpcTime:22ms, electionTimeout:21ms > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created
[ https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949306#comment-16949306 ] Tsz-wo Sze commented on RATIS-704: -- r704_20191011.patch: fixes checkstyle warning. Note that this is a simple work around. It is better to fix the underlying RPC implementation. > Invoke sendAsync as soon as OrderedAsync is created > --- > > Key: RATIS-704 > URL: https://issues.apache.org/jira/browse/RATIS-704 > Project: Ratis > Issue Type: Improvement > Components: client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: r704_20191009.patch, r704_20191011.patch > > > In OrderedAsync, the messages are sent asynchronously except for the first > message. The first message is used to establish the connection. > OrderedAsync will wait for the first message to complete before sending the > following messages. > Note that, when sending only two messages, the performance of sending the > messages asynchronously is degenerated to sending them sequentially > [~msingh] has discovered a case that can be optimized: an application may > send two or more messages and the first message may take a long time to > process. In this case, we may send a dummy lightweighted message establish > the connection, and then send real messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created
[ https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated RATIS-704: - Attachment: (was: r706_20191011b.patch) > Invoke sendAsync as soon as OrderedAsync is created > --- > > Key: RATIS-704 > URL: https://issues.apache.org/jira/browse/RATIS-704 > Project: Ratis > Issue Type: Improvement > Components: client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: r704_20191009.patch, r704_20191011.patch > > > In OrderedAsync, the messages are sent asynchronously except for the first > message. The first message is used to establish the connection. > OrderedAsync will wait for the first message to complete before sending the > following messages. > Note that, when sending only two messages, the performance of sending the > messages asynchronously is degenerated to sending them sequentially > [~msingh] has discovered a case that can be optimized: an application may > send two or more messages and the first message may take a long time to > process. In this case, we may send a dummy lightweighted message establish > the connection, and then send real messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created
[ https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated RATIS-704: - Attachment: r704_20191011.patch > Invoke sendAsync as soon as OrderedAsync is created > --- > > Key: RATIS-704 > URL: https://issues.apache.org/jira/browse/RATIS-704 > Project: Ratis > Issue Type: Improvement > Components: client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: r704_20191009.patch, r704_20191011.patch > > > In OrderedAsync, the messages are sent asynchronously except for the first > message. The first message is used to establish the connection. > OrderedAsync will wait for the first message to complete before sending the > following messages. > Note that, when sending only two messages, the performance of sending the > messages asynchronously is degenerated to sending them sequentially > [~msingh] has discovered a case that can be optimized: an application may > send two or more messages and the first message may take a long time to > process. In this case, we may send a dummy lightweighted message establish > the connection, and then send real messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created
[ https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated RATIS-704: - Attachment: r706_20191011b.patch > Invoke sendAsync as soon as OrderedAsync is created > --- > > Key: RATIS-704 > URL: https://issues.apache.org/jira/browse/RATIS-704 > Project: Ratis > Issue Type: Improvement > Components: client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: r704_20191009.patch, r704_20191011.patch > > > In OrderedAsync, the messages are sent asynchronously except for the first > message. The first message is used to establish the connection. > OrderedAsync will wait for the first message to complete before sending the > following messages. > Note that, when sending only two messages, the performance of sending the > messages asynchronously is degenerated to sending them sequentially > [~msingh] has discovered a case that can be optimized: an application may > send two or more messages and the first message may take a long time to > process. In this case, we may send a dummy lightweighted message establish > the connection, and then send real messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-705) GrpcClientProtocolClient#close Interrupts itself
[ https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated RATIS-705: - Fix Version/s: 0.5.0 > GrpcClientProtocolClient#close Interrupts itself > > > Key: RATIS-705 > URL: https://issues.apache.org/jira/browse/RATIS-705 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Nilotpal Nandi >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.5.0 > > Attachments: RATIS-705.001.patch, RATIS-705.002.patch > > > GrpcClientProtocolClient#close throws InterruptedException. This happens when > GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. > GrpcClientProtocolClient#close calls scheduler.close() which interrupts all > the timeout scheduler threads including the thread executing the close > routine. This leads to InterruptedException when channel.awaitTermination is > called. > > {code:java} > 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception > while waiting for channel termination > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763) > at > org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74) > at > org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) > at > org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127) > at > org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136) > at > org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at
[jira] [Updated] (RATIS-705) GrpcClientProtocolClient#close Interrupts itself
[ https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated RATIS-705: - Summary: GrpcClientProtocolClient#close Interrupts itself (was: GrpcClientProtocolClient#close throws InterruptedException) > GrpcClientProtocolClient#close Interrupts itself > > > Key: RATIS-705 > URL: https://issues.apache.org/jira/browse/RATIS-705 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Nilotpal Nandi >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-705.001.patch, RATIS-705.002.patch > > > GrpcClientProtocolClient#close throws InterruptedException. This happens when > GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. > GrpcClientProtocolClient#close calls scheduler.close() which interrupts all > the timeout scheduler threads including the thread executing the close > routine. This leads to InterruptedException when channel.awaitTermination is > called. > > {code:java} > 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception > while waiting for channel termination > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763) > at > org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74) > at > org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) > at > org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127) > at > org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136) > at > org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Commented] (RATIS-708) ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms
[ https://issues.apache.org/jira/browse/RATIS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949241#comment-16949241 ] Tsz-wo Sze commented on RATIS-708: -- [~msingh], could you run a few times to see if it consistently takes close to 36ms? Could you post the code for reproducing it? Thanks. > ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms > --- > > Key: RATIS-708 > URL: https://issues.apache.org/jira/browse/RATIS-708 > Project: Ratis > Issue Type: Bug > Components: client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Tsz-wo Sze >Priority: Major > > ClientProtoUtils#toRaftClientRequestProto in the profiling is taking a lot of > time to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-705) GrpcClientProtocolClient#close throws InterruptedException
[ https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949236#comment-16949236 ] Tsz-wo Sze commented on RATIS-705: -- +1 for the 002 patch. Thanks a lot for explanation. > GrpcClientProtocolClient#close throws InterruptedException > -- > > Key: RATIS-705 > URL: https://issues.apache.org/jira/browse/RATIS-705 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Nilotpal Nandi >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-705.001.patch, RATIS-705.002.patch > > > GrpcClientProtocolClient#close throws InterruptedException. This happens when > GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. > GrpcClientProtocolClient#close calls scheduler.close() which interrupts all > the timeout scheduler threads including the thread executing the close > routine. This leads to InterruptedException when channel.awaitTermination is > called. > > {code:java} > 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception > while waiting for channel termination > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763) > at > org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74) > at > org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) > at > org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127) > at > org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136) > at > org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc
[ https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949196#comment-16949196 ] Hadoop QA commented on RATIS-706: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 39s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.examples.filestore.TestFileStoreWithNetty | | | ratis.netty.TestRaftSnapshotWithNetty | | | ratis.server.simulation.TestRaftExceptionWithSimulation | | | ratis.grpc.TestGroupInfoWithGrpc | | | ratis.server.simulation.TestRaftWithSimulatedRpc | | | ratis.grpc.TestRaftAsyncWithGrpc | | | ratis.netty.TestRetryCacheWithNettyRpc | | | ratis.grpc.TestRaftWithGrpc | | | ratis.grpc.TestLeaderElectionWithGrpc | | | ratis.grpc.TestServerRestartWithGrpc | | | ratis.netty.TestGroupManagementWithNetty | | | ratis.netty.TestRaftWithNetty | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-11 | | JIRA Issue | RATIS-706 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982735/r706_20191011b.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 8f427da481b7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 699792d | | maven | version: Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/1056/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/1056/testReport/ | | Max. process+thread count | 2689 (vs. ulimit of 5000) | | modules | C: ratis-common ratis-test U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/1056/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org
[jira] [Commented] (RATIS-705) GrpcClientProtocolClient#close throws InterruptedException
[ https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949197#comment-16949197 ] Lokesh Jain commented on RATIS-705: --- | Are the changes GrpcClientProtocolClient required? The bug occurs because of following scenario. TimeoutScheduler thread executes a timeout for a client request. As part of handling the timeout, the GrpcClientProtocolClient is closed which closes the timeout scheduler and interrupts all the scheduler threads including the thread executing the close. Currently the thread is interrupted before channel wait is called leading to InterruptedException. Therefore the patch moves the call after channel is shutdown. | If yes, does it mean that we cannot call scheduler.close() at all? I think if we call it at the end of the GrpcClientProtocolClient#close function because after that there are no wait calls. > GrpcClientProtocolClient#close throws InterruptedException > -- > > Key: RATIS-705 > URL: https://issues.apache.org/jira/browse/RATIS-705 > Project: Ratis > Issue Type: Bug > Components: gRPC >Reporter: Nilotpal Nandi >Assignee: Lokesh Jain >Priority: Major > Attachments: RATIS-705.001.patch, RATIS-705.002.patch > > > GrpcClientProtocolClient#close throws InterruptedException. This happens when > GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. > GrpcClientProtocolClient#close calls scheduler.close() which interrupts all > the timeout scheduler threads including the thread executing the close > routine. This leads to InterruptedException when channel.awaitTermination is > called. > > {code:java} > 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception > while waiting for channel termination > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763) > at > org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74) > at > org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) > at > org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127) > at > org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136) > at > org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at
[jira] [Created] (RATIS-708) ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms
Mukul Kumar Singh created RATIS-708: --- Summary: ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms Key: RATIS-708 URL: https://issues.apache.org/jira/browse/RATIS-708 Project: Ratis Issue Type: Bug Components: client Affects Versions: 0.4.0 Reporter: Mukul Kumar Singh Assignee: Tsz-wo Sze ClientProtoUtils#toRaftClientRequestProto in the profiling is taking a lot of time to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)