[jira] [Commented] (JAMES-3676) S3 blobStore leaks HTTP connexions when reading streams partially (even when closing them))

Benoit Tellier (Jira) Wed, 24 Nov 2021 23:29:05 -0800


    [ 
https://issues.apache.org/jira/browse/JAMES-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448978#comment-17448978
 ]


Benoit Tellier commented on JAMES-3676:
---------------------------------------

3.7.0 PR: https://github.com/apache/james-project/pull/762

> S3 blobStore leaks HTTP connexions when reading streams partially (even when 
> closing them))
> -------------------------------------------------------------------------------------------
>
>                 Key: JAMES-3676
>                 URL: https://issues.apache.org/jira/browse/JAMES-3676
>             Project: James Server
>          Issue Type: Improvement
>          Components: Blob
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Priority: Major
>              Labels: bug
>             Fix For: 3.7.0, 3.6.1
>
>
> h3. Symptoms
> Recently I found those delightful logs on one of my production environment:
> {code:java}
>       
> {
>   "timestamp": "2021-11-25T04:19:56.897Z",
>   "level": "ERROR",
>   "thread": "imapserver-executor-560",
>   "mdc": {
>     "fetchedData": "FetchData{flags=false, uid=true, internalDate=false, 
> size=true, envelope=false, body=false, bodyStructure=false, setSeen=false, 
> bodyElements=[BodyFetchElement{firstOctet=null, numberOfOctets=null, 
> name=BODY[], sectionType=CONTENT, fieldNames=Optional.empty}], modSeq=false, 
> changedSince=-1, vanished=false}",
>     "protocol": "IMAP",
>     "useUid": "true",
>     "action": "FETCH",
>     "sessionId": "SID-cdkbuyectzcc",
>     "user": "[email protected]"
>   },
>   "logger": "org.apache.james.imap.processor.AbstractMailboxProcessor",
>   "message": "Unexpected error during IMAP processing",
>   "context": "default",
>   "exception": "software.amazon.awssdk.core.exception.SdkClientException: 
> Unable to execute HTTP request: Acquire operation took longer than the 
> configured maximum time. This indicates that a request cannot get a 
> connection from the pool within the specified maximum time. This can be due 
> to high request rate.\nConsider taking any of the following actions to 
> mitigate the issue: increase max connections, increase acquire timeout, or 
> slowing the request rate.\nIncreasing the max connections can increase client 
> throughput (unless the network interface is already fully utilized), but can 
> eventually start to hit operation system limitations on the number of file 
> descriptors used by the process. If you already are fully utilizing your 
> network interface or cannot further increase your connection count, 
> increasing the acquire timeout gives extra time for requests to acquire a 
> connection before timing out. If the connections doesn't free up, the 
> subsequent requests will still timeout.\nIf the above mechanisms are not able 
> to fix the issue, try smoothing out your requests so that large traffic 
> bursts cannot overload the client, being more efficient with the number of 
> times you need to call AWS, or by increasing the number of hosts sending 
> requests.\n\tat 
> software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)\n\tat
>  
> software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)\n\tat
>  
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)\n\tat
>  
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)\n\tat
>  
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)\n\tat
>  
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)\n\tat
>  java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
>  Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
>  Source)\n\tat 
> software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)\n\tat
>  java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
>  Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
>  Source)\n\tat 
> software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)\n\tat
>  java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
>  Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown 
> Source)\n\tat 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
>  Source)\n\tat 
> software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:158)\n\tat
>  
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:300)\n\tat
>  
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:172)\n\tat
>  
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat
>  
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)\n\tat
>  
> io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)\n\tat
>  io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)\n\tat 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)\n\tat
>  
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat
>  io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)\n\tat 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat
>  
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat
>  java.base/java.lang.Thread.run(Unknown Source)\nCaused by: 
> java.lang.Throwable: Acquire operation took longer than the configured 
> maximum time. This indicates that a request cannot get a connection from the 
> pool within the specified maximum time. This can be due to high request 
> rate.\nConsider taking any of the following actions to mitigate the issue: 
> increase max connections, increase acquire timeout, or slowing the request 
> rate.\nIncreasing the max connections can increase client throughput (unless 
> the network interface is already fully utilized), but can eventually start to 
> hit operation system limitations on the number of file descriptors used by 
> the process. If you already are fully utilizing your network interface or 
> cannot further increase your connection count, increasing the acquire timeout 
> gives extra time for requests to acquire a connection before timing out. If 
> the connections doesn't free up, the subsequent requests will still 
> timeout.\nIf the above mechanisms are not able to fix the issue, try 
> smoothing out your requests so that large traffic bursts cannot overload the 
> client, being more efficient with the number of times you need to call AWS, 
> or by increasing the number of hosts sending requests.\n\tat 
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:306)\n\tat
>  
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:299)\n\t...
>  11 common frames omitted\nCaused by: java.util.concurrent.TimeoutException: 
> Acquire operation took longer than 10000 milliseconds.\n\tat 
> software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77)\n\tat
>  
> software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67)\n\tat
>  
> org.glowroot.agent.plugin.executor.RunnableWrapper.run(RunnableWrapper.java:35)\n\tat
>  io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)\n\t...
>  6 common frames omitted\n"
> }
> {code}
> TLDR the S3 driver cannot get HTTP connexion from the pool and fails 
> randomly, which is very inconvenient.
> I attempted a reboot, which temporarily solved the issue but we eventually 
> run out again of HTTP connections after some time...
> h3. Diagnostic
> This sounds like a leak to me!
> I quickly wrote a unit test reproducing the exact same issue by partially 
> reading  many blobs. Closing the resulting or not proved to have no effect. 
> This pointed me to a defect within ReactorUtils.toInputStream was not 
> cancelling the related publisher when closing the stream: only full 
> consumption was releasing the subscription! Once again an easy unit test 
> properly demonstrate the issue.
> Propagating the "close" from the SequencedInputStream back to the subscribed 
> publisher cannot be done with an iterator intermediate representation, but 
> adopting a Java8 Stream instead allows to propagate closure and cancel the 
> subscription if need be. This approach solves both failing tests.
> Possible causes that "revealed" this issue includes:
>   - S3 driver update <3
>   - Additionnal usage of the BlobStore stream API for JMAP blob uploads?
> h3. Complementary work
> We might benefit to invest into tools for automatic leak detections.
> I found some java agents to do so: https://file-leak-detector.kohsuke.org/ 
> could do the job!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (JAMES-3676) S3 blobStore leaks HTTP connexions when reading streams partially (even when closing them))

Reply via email to