[
https://issues.apache.org/jira/browse/JAMES-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benoit Tellier updated JAMES-3676:
----------------------------------
Description:
h3. Symptoms
Recently I found those delightful logs on one of my production environment:
{code:java}
{
"timestamp": "2021-11-25T04:19:56.897Z",
"level": "ERROR",
"thread": "imapserver-executor-560",
"mdc": {
"fetchedData": "FetchData{flags=false, uid=true, internalDate=false,
size=true, envelope=false, body=false, bodyStructure=false, setSeen=false,
bodyElements=[BodyFetchElement{firstOctet=null, numberOfOctets=null,
name=BODY[], sectionType=CONTENT, fieldNames=Optional.empty}], modSeq=false,
changedSince=-1, vanished=false}",
"protocol": "IMAP",
"useUid": "true",
"action": "FETCH",
"sessionId": "SID-cdkbuyectzcc",
"user": "[email protected]"
},
"logger": "org.apache.james.imap.processor.AbstractMailboxProcessor",
"message": "Unexpected error during IMAP processing",
"context": "default",
"exception": "software.amazon.awssdk.core.exception.SdkClientException:
Unable to execute HTTP request: Acquire operation took longer than the
configured maximum time. This indicates that a request cannot get a connection
from the pool within the specified maximum time. This can be due to high
request rate.\nConsider taking any of the following actions to mitigate the
issue: increase max connections, increase acquire timeout, or slowing the
request rate.\nIncreasing the max connections can increase client throughput
(unless the network interface is already fully utilized), but can eventually
start to hit operation system limitations on the number of file descriptors
used by the process. If you already are fully utilizing your network interface
or cannot further increase your connection count, increasing the acquire
timeout gives extra time for requests to acquire a connection before timing
out. If the connections doesn't free up, the subsequent requests will still
timeout.\nIf the above mechanisms are not able to fix the issue, try smoothing
out your requests so that large traffic bursts cannot overload the client,
being more efficient with the number of times you need to call AWS, or by
increasing the number of hosts sending requests.\n\tat
software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)\n\tat
software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:158)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:300)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:172)\n\tat
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)\n\tat
io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)\n\tat
io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)\n\tat
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)\n\tat
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)\n\tat
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat
java.base/java.lang.Thread.run(Unknown Source)\nCaused by: java.lang.Throwable:
Acquire operation took longer than the configured maximum time. This indicates
that a request cannot get a connection from the pool within the specified
maximum time. This can be due to high request rate.\nConsider taking any of the
following actions to mitigate the issue: increase max connections, increase
acquire timeout, or slowing the request rate.\nIncreasing the max connections
can increase client throughput (unless the network interface is already fully
utilized), but can eventually start to hit operation system limitations on the
number of file descriptors used by the process. If you already are fully
utilizing your network interface or cannot further increase your connection
count, increasing the acquire timeout gives extra time for requests to acquire
a connection before timing out. If the connections doesn't free up, the
subsequent requests will still timeout.\nIf the above mechanisms are not able
to fix the issue, try smoothing out your requests so that large traffic bursts
cannot overload the client, being more efficient with the number of times you
need to call AWS, or by increasing the number of hosts sending requests.\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:306)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:299)\n\t...
11 common frames omitted\nCaused by: java.util.concurrent.TimeoutException:
Acquire operation took longer than 10000 milliseconds.\n\tat
software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77)\n\tat
software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67)\n\tat
org.glowroot.agent.plugin.executor.RunnableWrapper.run(RunnableWrapper.java:35)\n\tat
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)\n\t...
6 common frames omitted\n"
}
{code}
TLDR the S3 driver cannot get HTTP connexion from the pool and fails randomly,
which is very inconvenient.
I attempted a reboot, which temporarily solved the issue but we eventually run
out again of HTTP connections after some time...
h3. Diagnostic
This sounds like a leak to me!
I quickly wrote a unit test reproducing the exact same issue by partially
reading many blobs. Closing the resulting or not proved to have no effect.
This pointed me to a defect within ReactorUtils.toInputStream was not
cancelling the related publisher when closing the stream: only full consumption
was releasing the subscription! Once again an easy unit test properly
demonstrate the issue.
Propagating the "close" from the SequencedInputStream back to the subscribed
publisher cannot be done with an iterator intermediate representation, but
adopting a Java8 Stream instead allows to propagate closure and cancel the
subscription if need be. This approach solves both failing tests.
Possible causes that "revealed" this issue includes:
- S3 driver update <3
- Additionnal usage of the BlobStore stream API for JMAP blob uploads?
h3. Complementary work
We might benefit to invest into tools for automatic leak detections.
I found some java agents to do so: https://file-leak-detector.kohsuke.org/
could do the job!
was:
h3. Symptoms
Recently I found those delightful logs on one of my production environment:
{code:java}
{
"timestamp": "2021-11-25T04:19:56.897Z",
"level": "ERROR",
"thread": "imapserver-executor-560",
"mdc": {
"fetchedData": "FetchData{flags=false, uid=true, internalDate=false,
size=true, envelope=false, body=false, bodyStructure=false, setSeen=false,
bodyElements=[BodyFetchElement{firstOctet=null, numberOfOctets=null,
name=BODY[], sectionType=CONTENT, fieldNames=Optional.empty}], modSeq=false,
changedSince=-1, vanished=false}",
"protocol": "IMAP",
"useUid": "true",
"action": "FETCH",
"sessionId": "SID-cdkbuyectzcc",
"user": "[email protected]"
},
"logger": "org.apache.james.imap.processor.AbstractMailboxProcessor",
"message": "Unexpected error during IMAP processing",
"context": "default",
"exception": "software.amazon.awssdk.core.exception.SdkClientException:
Unable to execute HTTP request: Acquire operation took longer than the
configured maximum time. This indicates that a request cannot get a connection
from the pool within the specified maximum time. This can be due to high
request rate.\nConsider taking any of the following actions to mitigate the
issue: increase max connections, increase acquire timeout, or slowing the
request rate.\nIncreasing the max connections can increase client throughput
(unless the network interface is already fully utilized), but can eventually
start to hit operation system limitations on the number of file descriptors
used by the process. If you already are fully utilizing your network interface
or cannot further increase your connection count, increasing the acquire
timeout gives extra time for requests to acquire a connection before timing
out. If the connections doesn't free up, the subsequent requests will still
timeout.\nIf the above mechanisms are not able to fix the issue, try smoothing
out your requests so that large traffic bursts cannot overload the client,
being more efficient with the number of times you need to call AWS, or by
increasing the number of hosts sending requests.\n\tat
software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)\n\tat
software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)\n\tat
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)\n\tat
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)\n\tat
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:158)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:300)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:172)\n\tat
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)\n\tat
io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)\n\tat
io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)\n\tat
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)\n\tat
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)\n\tat
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat
java.base/java.lang.Thread.run(Unknown Source)\nCaused by: java.lang.Throwable:
Acquire operation took longer than the configured maximum time. This indicates
that a request cannot get a connection from the pool within the specified
maximum time. This can be due to high request rate.\nConsider taking any of the
following actions to mitigate the issue: increase max connections, increase
acquire timeout, or slowing the request rate.\nIncreasing the max connections
can increase client throughput (unless the network interface is already fully
utilized), but can eventually start to hit operation system limitations on the
number of file descriptors used by the process. If you already are fully
utilizing your network interface or cannot further increase your connection
count, increasing the acquire timeout gives extra time for requests to acquire
a connection before timing out. If the connections doesn't free up, the
subsequent requests will still timeout.\nIf the above mechanisms are not able
to fix the issue, try smoothing out your requests so that large traffic bursts
cannot overload the client, being more efficient with the number of times you
need to call AWS, or by increasing the number of hosts sending requests.\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:306)\n\tat
software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:299)\n\t...
11 common frames omitted\nCaused by: java.util.concurrent.TimeoutException:
Acquire operation took longer than 10000 milliseconds.\n\tat
software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77)\n\tat
software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67)\n\tat
org.glowroot.agent.plugin.executor.RunnableWrapper.run(RunnableWrapper.java:35)\n\tat
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)\n\t...
6 common frames omitted\n"
}
{code}
TLDR the S3 driver cannot get HTTP connexion from the pool and fails randomly,
which is very inconvenient.
I attempted a reboot, which temporarily solved the issue but we eventually run
out again of HTTP connections after some time...
h3. Diagnostic
This sounds like a leak to me!
I quickly wrote a unit test reproducing the exact same issue by partially
reading many blobs. Closing the resulting or not proved to have no effect.
This pointed me to a defect within ReactorUtils.toInputStream was not
cancelling the related publisher when closing the stream: only full consumption
was releasing the subscription! Once again an easy unit test properly
demonstrate the issue.
Propagating the "close" from the SequencedInputStream back to the subscribed
publisher cannot be done with an iterator intermediate representation, but
adopting a Java8 Stream instead allows to propagate closure and cancel the
subscription if need be. This approach solves both failing tests.
Possible causes that "revealed" this issue includes:
- S3 driver update <3
-
h3. Complementary work
We might benefit to invest into tools for automatic leak detections.
I found some java agents to do so: https://file-leak-detector.kohsuke.org/
could do the job!
> S3 blobStore leaks HTTP connexions when reading streams partially (even when
> closing them))
> -------------------------------------------------------------------------------------------
>
> Key: JAMES-3676
> URL: https://issues.apache.org/jira/browse/JAMES-3676
> Project: James Server
> Issue Type: Improvement
> Components: Blob
> Affects Versions: 3.6.0
> Reporter: Benoit Tellier
> Priority: Major
> Labels: bug
> Fix For: 3.7.0, 3.6.1
>
>
> h3. Symptoms
> Recently I found those delightful logs on one of my production environment:
> {code:java}
>
> {
> "timestamp": "2021-11-25T04:19:56.897Z",
> "level": "ERROR",
> "thread": "imapserver-executor-560",
> "mdc": {
> "fetchedData": "FetchData{flags=false, uid=true, internalDate=false,
> size=true, envelope=false, body=false, bodyStructure=false, setSeen=false,
> bodyElements=[BodyFetchElement{firstOctet=null, numberOfOctets=null,
> name=BODY[], sectionType=CONTENT, fieldNames=Optional.empty}], modSeq=false,
> changedSince=-1, vanished=false}",
> "protocol": "IMAP",
> "useUid": "true",
> "action": "FETCH",
> "sessionId": "SID-cdkbuyectzcc",
> "user": "[email protected]"
> },
> "logger": "org.apache.james.imap.processor.AbstractMailboxProcessor",
> "message": "Unexpected error during IMAP processing",
> "context": "default",
> "exception": "software.amazon.awssdk.core.exception.SdkClientException:
> Unable to execute HTTP request: Acquire operation took longer than the
> configured maximum time. This indicates that a request cannot get a
> connection from the pool within the specified maximum time. This can be due
> to high request rate.\nConsider taking any of the following actions to
> mitigate the issue: increase max connections, increase acquire timeout, or
> slowing the request rate.\nIncreasing the max connections can increase client
> throughput (unless the network interface is already fully utilized), but can
> eventually start to hit operation system limitations on the number of file
> descriptors used by the process. If you already are fully utilizing your
> network interface or cannot further increase your connection count,
> increasing the acquire timeout gives extra time for requests to acquire a
> connection before timing out. If the connections doesn't free up, the
> subsequent requests will still timeout.\nIf the above mechanisms are not able
> to fix the issue, try smoothing out your requests so that large traffic
> bursts cannot overload the client, being more efficient with the number of
> times you need to call AWS, or by increasing the number of hosts sending
> requests.\n\tat
> software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)\n\tat
>
> software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)\n\tat
>
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)\n\tat
>
> software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)\n\tat
>
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)\n\tat
>
> software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)\n\tat
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)\n\tat
> software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)\n\tat
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)\n\tat
> software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)\n\tat
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)\n\tat
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)\n\tat
> software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:158)\n\tat
>
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:300)\n\tat
>
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:172)\n\tat
>
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat
>
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)\n\tat
>
> io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)\n\tat
> io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)\n\tat
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)\n\tat
>
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)\n\tat
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat
>
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat
> java.base/java.lang.Thread.run(Unknown Source)\nCaused by:
> java.lang.Throwable: Acquire operation took longer than the configured
> maximum time. This indicates that a request cannot get a connection from the
> pool within the specified maximum time. This can be due to high request
> rate.\nConsider taking any of the following actions to mitigate the issue:
> increase max connections, increase acquire timeout, or slowing the request
> rate.\nIncreasing the max connections can increase client throughput (unless
> the network interface is already fully utilized), but can eventually start to
> hit operation system limitations on the number of file descriptors used by
> the process. If you already are fully utilizing your network interface or
> cannot further increase your connection count, increasing the acquire timeout
> gives extra time for requests to acquire a connection before timing out. If
> the connections doesn't free up, the subsequent requests will still
> timeout.\nIf the above mechanisms are not able to fix the issue, try
> smoothing out your requests so that large traffic bursts cannot overload the
> client, being more efficient with the number of times you need to call AWS,
> or by increasing the number of hosts sending requests.\n\tat
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:306)\n\tat
>
> software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:299)\n\t...
> 11 common frames omitted\nCaused by: java.util.concurrent.TimeoutException:
> Acquire operation took longer than 10000 milliseconds.\n\tat
> software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77)\n\tat
>
> software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67)\n\tat
>
> org.glowroot.agent.plugin.executor.RunnableWrapper.run(RunnableWrapper.java:35)\n\tat
> io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)\n\t...
> 6 common frames omitted\n"
> }
> {code}
> TLDR the S3 driver cannot get HTTP connexion from the pool and fails
> randomly, which is very inconvenient.
> I attempted a reboot, which temporarily solved the issue but we eventually
> run out again of HTTP connections after some time...
> h3. Diagnostic
> This sounds like a leak to me!
> I quickly wrote a unit test reproducing the exact same issue by partially
> reading many blobs. Closing the resulting or not proved to have no effect.
> This pointed me to a defect within ReactorUtils.toInputStream was not
> cancelling the related publisher when closing the stream: only full
> consumption was releasing the subscription! Once again an easy unit test
> properly demonstrate the issue.
> Propagating the "close" from the SequencedInputStream back to the subscribed
> publisher cannot be done with an iterator intermediate representation, but
> adopting a Java8 Stream instead allows to propagate closure and cancel the
> subscription if need be. This approach solves both failing tests.
> Possible causes that "revealed" this issue includes:
> - S3 driver update <3
> - Additionnal usage of the BlobStore stream API for JMAP blob uploads?
> h3. Complementary work
> We might benefit to invest into tools for automatic leak detections.
> I found some java agents to do so: https://file-leak-detector.kohsuke.org/
> could do the job!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]