Nilotpal Nandi created HDDS-3293:
------------------------------------
Summary: read operation failing when two container replicas are
corrupted
Key: HDDS-3293
URL: https://issues.apache.org/jira/browse/HDDS-3293
Project: Hadoop Distributed Data Store
Issue Type: Bug
Components: Ozone Datanode
Reporter: Nilotpal Nandi
steps taken :
1) Mounted noise injection FUSE on all datanodes.
2) Write a key ( multi blocks)
3) Select one of the container ids , inject error on 2 container replicas for
that container id.
4) Run GET key operation.
GET key operation fails intermittenly.
Error seen :
-------------
{noformat}
20/03/27 18:30:40 WARN impl.MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
E 20/03/27 18:30:40 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot
period at 10 second(s).
E 20/03/27 18:30:40 INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics
system started
E 20/03/27 18:31:12 ERROR scm.XceiverClientGrpc: Failed to execute command
cmdType: ReadChunk
E traceID: "f80a51eaec481a1c:cbb8e92869015a53:f80a51eaec481a1c:0"
E containerID: 67
E datanodeUuid: "96101390-2446-40e6-a54e-36e170497e57"
E readChunk {
E blockID {
E containerID: 67
E localID: 103896435892617248
E blockCommitSequenceId: 1010
E }
E chunkData {
E chunkName: "103896435892617248_chunk_28"
E offset: 113246208
E len: 4194304
E checksumData {
E type: CRC32
E bytesPerChecksum: 1048576
E checksums: "\034\376\313\031"
E checksums: ";U\225\037"
E checksums: "\327m\332."
E checksums: "|\307\004E"
E }
E }
E }
E on the pipeline Pipeline[ Id: bce6316c-9690-452b-80e3-0f3590533444, Nodes:
96101390-2446-40e6-a54e-36e170497e57{ip: 172.27.111.129, host:
quasar-olrywk-3.quasar-olrywk.root.hwx.site, networkLocation: /default-rack,
certSerialId: null}3e85204d-2399-43b5-952a-55b837eb4c1d{ip: 172.27.100.0, host:
quasar-olrywk-1.quasar-olrywk.root.hwx.site, networkLocation: /default-rack,
certSerialId: null}5af0340a-6fee-4ce8-9f68-37fa35566a5a{ip: 172.27.73.0, host:
quasar-olrywk-9.quasar-olrywk.root.hwx.site, networkLocation: /default-rack,
certSerialId: null}, Type:STAND_ALONE, Factor:THREE, State:OPEN,
leaderId:96101390-2446-40e6-a54e-36e170497e57,
CreationTimestamp2020-03-27T03:36:51.880Z].
E Unexpected OzoneException: java.io.IOException:
java.util.concurrent.ExecutionException:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED:
deadline exceeded after 84603913ns. [remote_addr=/172.27.73.0:9859]]{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]