[jira] [Created] (RATIS-711) Add ability to specify higher request timeout in watch request

2019-10-11 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-711:
-

 Summary: Add ability to specify higher request timeout in watch 
request
 Key: RATIS-711
 URL: https://issues.apache.org/jira/browse/RATIS-711
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee


Currently , a watch request from raft client times out by default in 3 sec . In 
certain conditions, it may be required to have a higher watch request timeout 
value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-710) GC pauses in leader should not penalize appendRequest response processing

2019-10-11 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-710:
-

 Summary: GC pauses in leader should not penalize appendRequest 
response processing
 Key: RATIS-710
 URL: https://issues.apache.org/jira/browse/RATIS-710
 Project: Ratis
  Issue Type: Bug
  Components: raft-group
Reporter: Shashikant Banerjee
 Fix For: 0.5.0


In ozone perf testing, it was observed that once leader goes through gc pause 
cycle and wakes up, it just times out all append requests , but the follower 
seems to be processing the append requests fine. It goes in a loop and ends up 
failing the watch requests on the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-709) RaftClient should not retry on a different leader on NotReplicated exception from leader

2019-10-11 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-709:
-

 Summary: RaftClient should not retry on a different leader on 
NotReplicated exception from leader
 Key: RATIS-709
 URL: https://issues.apache.org/jira/browse/RATIS-709
 Project: Ratis
  Issue Type: Bug
  Components: client
Reporter: Shashikant Banerjee
 Fix For: 0.5.0


Currently, when a watch request times out with a NotReplicatedException on the 
leader raft client starts retrying the request on different server and starts 
failing with NotLeaderException and it goes in a loop. Ideally , when a watch 
request times out , it should be retried automatically by raft client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949351#comment-16949351
 ] 

Hadoop QA commented on RATIS-704:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
7s{color} | {color:red} root in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  1s{color} 
| {color:red} root in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m  
2s{color} | {color:blue} ASF License check generated no output? {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-11 |
| JIRA Issue | RATIS-704 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982756/r704_20191011.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 0db9214363c5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 3f446aa |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| javadoc | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1058/artifact/out/patch-javadoc-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1058/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1058/testReport/ |
| Max. process+thread count | 86 (vs. ulimit of 5000) |
| modules | C: ratis-common ratis-client U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1058/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Invoke sendAsync as soon as OrderedAsync is created
> ---
>
> Key: RATIS-704
> URL: https://issues.apache.org/jira/browse/RATIS-704
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz-wo Sze

[jira] [Commented] (RATIS-707) Test failures caused by minTimeout set to zero

2019-10-11 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949342#comment-16949342
 ] 

Tsz-wo Sze commented on RATIS-707:
--

[~swagle], we should review the idea of RATIS-698.  The idea is to not sleeping 
for the first leader election.  However, in a distributed system, how could a 
server tells if the upcoming election is the first one?  After RATIS-698, some 
servers may honor min timeout but some others may not.

An existing leader with a majority of followers may be incorrectly forced to 
step down when a new server is joining the group since the new server is not 
honoring the min timeout.  The new server, who is not honoring min timeout, 
won't give a chance to the old leader, who is honoring the min timeout, to send 
a heartbeat.

> Test failures caused by minTimeout set to zero
> --
>
> Key: RATIS-707
> URL: https://issues.apache.org/jira/browse/RATIS-707
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-707.01.patch
>
>
> TestRaftAsyncWithGrpc#testBasicAppendEntriesAsync and other tests fail if the 
> initial minTimeout is 0 then the server can trigger a leader election much 
> more frequently because the heartbeat interval is still at minTimeoutMs/2
> {code}
> 2019-10-11 00:45:47,813 INFO  impl.FollowerState 
> (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change 
> to CANDIDATE, lastRpcTime:21ms, electionTimeout:17ms
> 2019-10-11 00:45:47,870 INFO  impl.FollowerState 
> (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change 
> to CANDIDATE, lastRpcTime:35ms, electionTimeout:31ms
> 2019-10-11 00:45:47,933 INFO  impl.FollowerState 
> (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change 
> to CANDIDATE, lastRpcTime:51ms, electionTimeout:51ms
> 2019-10-11 00:45:47,969 INFO  impl.FollowerState 
> (FollowerState.java:run(108)) - s0@group-C51B0F2AC202-FollowerState: change 
> to CANDIDATE, lastRpcTime:22ms, electionTimeout:21ms
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created

2019-10-11 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949306#comment-16949306
 ] 

Tsz-wo Sze commented on RATIS-704:
--

r704_20191011.patch: fixes checkstyle warning.

Note that this is a simple work around.  It is better to fix the underlying RPC 
implementation.


> Invoke sendAsync as soon as OrderedAsync is created
> ---
>
> Key: RATIS-704
> URL: https://issues.apache.org/jira/browse/RATIS-704
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r704_20191009.patch, r704_20191011.patch
>
>
> In OrderedAsync, the messages are sent asynchronously except for the first 
> message.  The first message is used to establish the connection.  
> OrderedAsync will wait for the first message to complete before sending the 
> following messages.
> Note that, when sending only two messages, the performance of sending the 
> messages asynchronously is degenerated to sending them sequentially 
> [~msingh] has discovered a case that can be optimized: an application may 
> send two or more messages and the first message may take a long time to 
> process.  In this case, we may send a dummy lightweighted message establish 
> the connection, and then send real messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created

2019-10-11 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated RATIS-704:
-
Attachment: (was: r706_20191011b.patch)

> Invoke sendAsync as soon as OrderedAsync is created
> ---
>
> Key: RATIS-704
> URL: https://issues.apache.org/jira/browse/RATIS-704
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r704_20191009.patch, r704_20191011.patch
>
>
> In OrderedAsync, the messages are sent asynchronously except for the first 
> message.  The first message is used to establish the connection.  
> OrderedAsync will wait for the first message to complete before sending the 
> following messages.
> Note that, when sending only two messages, the performance of sending the 
> messages asynchronously is degenerated to sending them sequentially 
> [~msingh] has discovered a case that can be optimized: an application may 
> send two or more messages and the first message may take a long time to 
> process.  In this case, we may send a dummy lightweighted message establish 
> the connection, and then send real messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created

2019-10-11 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated RATIS-704:
-
Attachment: r704_20191011.patch

> Invoke sendAsync as soon as OrderedAsync is created
> ---
>
> Key: RATIS-704
> URL: https://issues.apache.org/jira/browse/RATIS-704
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r704_20191009.patch, r704_20191011.patch
>
>
> In OrderedAsync, the messages are sent asynchronously except for the first 
> message.  The first message is used to establish the connection.  
> OrderedAsync will wait for the first message to complete before sending the 
> following messages.
> Note that, when sending only two messages, the performance of sending the 
> messages asynchronously is degenerated to sending them sequentially 
> [~msingh] has discovered a case that can be optimized: an application may 
> send two or more messages and the first message may take a long time to 
> process.  In this case, we may send a dummy lightweighted message establish 
> the connection, and then send real messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-704) Invoke sendAsync as soon as OrderedAsync is created

2019-10-11 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated RATIS-704:
-
Attachment: r706_20191011b.patch

> Invoke sendAsync as soon as OrderedAsync is created
> ---
>
> Key: RATIS-704
> URL: https://issues.apache.org/jira/browse/RATIS-704
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r704_20191009.patch, r704_20191011.patch
>
>
> In OrderedAsync, the messages are sent asynchronously except for the first 
> message.  The first message is used to establish the connection.  
> OrderedAsync will wait for the first message to complete before sending the 
> following messages.
> Note that, when sending only two messages, the performance of sending the 
> messages asynchronously is degenerated to sending them sequentially 
> [~msingh] has discovered a case that can be optimized: an application may 
> send two or more messages and the first message may take a long time to 
> process.  In this case, we may send a dummy lightweighted message establish 
> the connection, and then send real messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-705) GrpcClientProtocolClient#close Interrupts itself

2019-10-11 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated RATIS-705:
-
Fix Version/s: 0.5.0

> GrpcClientProtocolClient#close Interrupts itself
> 
>
> Key: RATIS-705
> URL: https://issues.apache.org/jira/browse/RATIS-705
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-705.001.patch, RATIS-705.002.patch
>
>
> GrpcClientProtocolClient#close throws InterruptedException. This happens when 
> GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. 
> GrpcClientProtocolClient#close calls scheduler.close() which interrupts all 
> the timeout scheduler threads including the thread executing the close 
> routine. This leads to InterruptedException when channel.awaitTermination is 
> called.
>  
> {code:java}
> 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception 
> while waiting for channel termination
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
> at 
> org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127)
> at 
> org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136)
> at 
> org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372)
> at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236)
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 

[jira] [Updated] (RATIS-705) GrpcClientProtocolClient#close Interrupts itself

2019-10-11 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated RATIS-705:
-
Summary: GrpcClientProtocolClient#close Interrupts itself  (was: 
GrpcClientProtocolClient#close throws InterruptedException)

> GrpcClientProtocolClient#close Interrupts itself
> 
>
> Key: RATIS-705
> URL: https://issues.apache.org/jira/browse/RATIS-705
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-705.001.patch, RATIS-705.002.patch
>
>
> GrpcClientProtocolClient#close throws InterruptedException. This happens when 
> GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. 
> GrpcClientProtocolClient#close calls scheduler.close() which interrupts all 
> the timeout scheduler threads including the thread executing the close 
> routine. This leads to InterruptedException when channel.awaitTermination is 
> called.
>  
> {code:java}
> 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception 
> while waiting for channel termination
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
> at 
> org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127)
> at 
> org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136)
> at 
> org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372)
> at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236)
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Commented] (RATIS-708) ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms

2019-10-11 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949241#comment-16949241
 ] 

Tsz-wo Sze commented on RATIS-708:
--

[~msingh], could you run a few times to see if it consistently takes close to 
36ms?

Could you post the code for reproducing it?  Thanks.

> ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms
> ---
>
> Key: RATIS-708
> URL: https://issues.apache.org/jira/browse/RATIS-708
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Tsz-wo Sze
>Priority: Major
>
> ClientProtoUtils#toRaftClientRequestProto in the profiling is taking a lot of 
> time to process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-705) GrpcClientProtocolClient#close throws InterruptedException

2019-10-11 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949236#comment-16949236
 ] 

Tsz-wo Sze commented on RATIS-705:
--

+1 for the 002 patch.  Thanks a lot for explanation.

> GrpcClientProtocolClient#close throws InterruptedException
> --
>
> Key: RATIS-705
> URL: https://issues.apache.org/jira/browse/RATIS-705
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-705.001.patch, RATIS-705.002.patch
>
>
> GrpcClientProtocolClient#close throws InterruptedException. This happens when 
> GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. 
> GrpcClientProtocolClient#close calls scheduler.close() which interrupts all 
> the timeout scheduler threads including the thread executing the close 
> routine. This leads to InterruptedException when channel.awaitTermination is 
> called.
>  
> {code:java}
> 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception 
> while waiting for channel termination
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
> at 
> org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127)
> at 
> org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136)
> at 
> org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372)
> at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236)
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Commented] (RATIS-706) Dead lock in GrpcClientRpc

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949196#comment-16949196
 ] 

Hadoop QA commented on RATIS-706:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 39s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.examples.filestore.TestFileStoreWithNetty |
|   | ratis.netty.TestRaftSnapshotWithNetty |
|   | ratis.server.simulation.TestRaftExceptionWithSimulation |
|   | ratis.grpc.TestGroupInfoWithGrpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.grpc.TestRaftAsyncWithGrpc |
|   | ratis.netty.TestRetryCacheWithNettyRpc |
|   | ratis.grpc.TestRaftWithGrpc |
|   | ratis.grpc.TestLeaderElectionWithGrpc |
|   | ratis.grpc.TestServerRestartWithGrpc |
|   | ratis.netty.TestGroupManagementWithNetty |
|   | ratis.netty.TestRaftWithNetty |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-11 |
| JIRA Issue | RATIS-706 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982735/r706_20191011b.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 8f427da481b7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 699792d |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/testReport/ |
| Max. process+thread count | 2689 (vs. ulimit of 5000) |
| modules | C: ratis-common ratis-test U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1056/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org 

[jira] [Commented] (RATIS-705) GrpcClientProtocolClient#close throws InterruptedException

2019-10-11 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949197#comment-16949197
 ] 

Lokesh Jain commented on RATIS-705:
---

| Are the changes GrpcClientProtocolClient required?

The bug occurs because of following scenario. TimeoutScheduler thread executes 
a timeout for a client request. As part of handling the timeout, the 
GrpcClientProtocolClient is closed which closes the timeout scheduler and 
interrupts all the scheduler threads including the thread executing the close. 
Currently the thread is interrupted before channel wait is called leading to 
InterruptedException. Therefore the patch moves the call after channel is 
shutdown.

| If yes, does it mean that we cannot call scheduler.close() at all?

I think if we call it at the end of the GrpcClientProtocolClient#close function 
because after that there are no wait calls.

> GrpcClientProtocolClient#close throws InterruptedException
> --
>
> Key: RATIS-705
> URL: https://issues.apache.org/jira/browse/RATIS-705
> Project: Ratis
>  Issue Type: Bug
>  Components: gRPC
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-705.001.patch, RATIS-705.002.patch
>
>
> GrpcClientProtocolClient#close throws InterruptedException. This happens when 
> GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. 
> GrpcClientProtocolClient#close calls scheduler.close() which interrupts all 
> the timeout scheduler threads including the thread executing the close 
> routine. This leads to InterruptedException when channel.awaitTermination is 
> called.
>  
> {code:java}
> 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception 
> while waiting for channel termination
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
> at 
> org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127)
> at 
> org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136)
> at 
> org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372)
> at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236)
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
> at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
> at 

[jira] [Created] (RATIS-708) ClientProtoUtils#toRaftClientRequestProto for Ozone takes close to 36ms

2019-10-11 Thread Mukul Kumar Singh (Jira)
Mukul Kumar Singh created RATIS-708:
---

 Summary: ClientProtoUtils#toRaftClientRequestProto for Ozone takes 
close to 36ms
 Key: RATIS-708
 URL: https://issues.apache.org/jira/browse/RATIS-708
 Project: Ratis
  Issue Type: Bug
  Components: client
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Tsz-wo Sze


ClientProtoUtils#toRaftClientRequestProto in the profiling is taking a lot of 
time to process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)