Lokesh Jain created HDDS-3611:
---------------------------------
Summary: Ozone client should not consider closed container error
as failure
Key: HDDS-3611
URL: https://issues.apache.org/jira/browse/HDDS-3611
Project: Hadoop Distributed Data Store
Issue Type: Bug
Reporter: Lokesh Jain
ContainerNotOpen exception exception is thrown by datanode when client is
writing to a non open container. Currently ozone client sees this as failure
and would increment the retry count. If client reaches a configured retry count
it fails the write. Map reduce jobs were seen failing due to this error with
default retry count of 5.
Idea is to not consider errors due to closed container in retry count. This
would make sure that ozone client writes do not fail due to closed container
exceptions.
{code:java}
2020-05-15 02:20:28,375 ERROR [main]
org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed.
retries get failed due to exceeded maximum allowed retries number: 5
java.io.IOException: Unexpected Storage Container Exception:
java.util.concurrent.CompletionException:
java.util.concurrent.CompletionException:
org.apache.ratis.protocol.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: Container
15 in CLOSED state
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
at
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143)
at
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314)
at
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
...{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]