[jira] [Updated] (RATIS-485) TimeoutScheduler is leaked by gRPC client implementation

2019-08-30 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-485:
--
Attachment: RATIS-485.004.patch

> TimeoutScheduler is leaked by gRPC client implementation
> 
>
> Key: RATIS-485
> URL: https://issues.apache.org/jira/browse/RATIS-485
> Project: Ratis
>  Issue Type: Bug
>  Components: examples
>Reporter: Clay B.
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-485.003.patch, RATIS-485.004.patch, loadgen.log, 
> r485_20190827.patch, r485_20190828.patch
>
>
> Running the load generator without a Ratis cluster (e.g. spurious node IPs) 
> results in an OOM.
> If one has a single Ratis server it tries seemingly indefinitely:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1{code}
> If one has two Ratis servers it OOMs:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2
> [...]
> 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: 
> java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new 
> leader: null. Failed 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader 
> from n1 to n0
> 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with 
> policy RetryForeverNoSleep for 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: 
> unable to create new native thread
> Exception in thread "main" java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: unable to create new native thread
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
>     at 
> java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>     at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>     at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at 

[jira] [Updated] (RATIS-485) TimeoutScheduler is leaked by gRPC client implementation

2019-08-29 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated RATIS-485:
-
Attachment: RATIS-485.003.patch

> TimeoutScheduler is leaked by gRPC client implementation
> 
>
> Key: RATIS-485
> URL: https://issues.apache.org/jira/browse/RATIS-485
> Project: Ratis
>  Issue Type: Bug
>  Components: examples
>Reporter: Clay B.
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: RATIS-485.003.patch, loadgen.log, r485_20190827.patch, 
> r485_20190828.patch
>
>
> Running the load generator without a Ratis cluster (e.g. spurious node IPs) 
> results in an OOM.
> If one has a single Ratis server it tries seemingly indefinitely:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1{code}
> If one has two Ratis servers it OOMs:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2
> [...]
> 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: 
> java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new 
> leader: null. Failed 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader 
> from n1 to n0
> 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with 
> policy RetryForeverNoSleep for 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: 
> unable to create new native thread
> Exception in thread "main" java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: unable to create new native thread
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
>     at 
> java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>     at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>     at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: 

[jira] [Updated] (RATIS-485) TimeoutScheduler is leaked by gRPC client implementation

2019-08-28 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated RATIS-485:
-
Summary: TimeoutScheduler is leaked by gRPC client implementation  (was: 
Load Generator OOMs if Ratis Unavailable)

> TimeoutScheduler is leaked by gRPC client implementation
> 
>
> Key: RATIS-485
> URL: https://issues.apache.org/jira/browse/RATIS-485
> Project: Ratis
>  Issue Type: Bug
>  Components: examples
>Reporter: Clay B.
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: loadgen.log, r485_20190827.patch, r485_20190828.patch
>
>
> Running the load generator without a Ratis cluster (e.g. spurious node IPs) 
> results in an OOM.
> If one has a single Ratis server it tries seemingly indefinitely:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1{code}
> If one has two Ratis servers it OOMs:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2
> [...]
> 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: 
> java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new 
> leader: null. Failed 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader 
> from n1 to n0
> 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with 
> policy RetryForeverNoSleep for 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: 
> unable to create new native thread
> Exception in thread "main" java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: unable to create new native thread
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>     at 
> java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
>     at 
> java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>     at 
> org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>     at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>     at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>     at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>     at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at