Nilotpal Nandi created HDDS-3314:
------------------------------------
Summary: scmcli container info command failing intermittently
Key: HDDS-3314
URL: https://issues.apache.org/jira/browse/HDDS-3314
Project: Hadoop Distributed Data Store
Issue Type: Bug
Components: SCM, SCM Client
Reporter: Nilotpal Nandi
config set before running the command :
"ozone.scm.stale.node.interval": "2m",
"ozone.scm.dead.node.interval": "4m",
"hdds.scm.replication.thread.interval": "12s",
"ozone.scm.container.size": "1GB"
steps taken :
1) write a key (less than a block size)
2) shutdown two container replica datanodes.
3) Tried to query container info
Container info command failed .
{noformat}
ozone scmcli container info 33 | egrep 'Container|Datanodes'
Failed to execute command cmdType: ReadContainer
{noformat}
scm log during that time range :
{noformat}
2020-04-01 10:09:29,665 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth
successful for [email protected] (auth:KERBEROS)
2020-04-01 10:09:29,706 INFO
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for [email protected] (auth:KERBEROS) for
protocol=interface
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol
2020-04-01 10:09:55,283 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth
successful for dn/[email protected]
(auth:KERBEROS)
2020-04-01 10:09:55,287 INFO
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for
dn/[email protected] (auth:KERBEROS)
for protocol=interface
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
2020-04-01 10:09:55,474 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Starting Replication
Monitor Thread.
2020-04-01 10:09:55,486 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor
Thread took 10 milliseconds for processing 33 containers.
2020-04-01 10:10:07,488 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor
Thread took 2 milliseconds for processing 33 containers.
2020-04-01 10:10:17,996 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth
successful for dn/[email protected]
(auth:KERBEROS)
2020-04-01 10:10:18,001 INFO
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for
dn/[email protected] (auth:KERBEROS)
for protocol=interface
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
2020-04-01 10:10:19,491 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor
Thread took 3 milliseconds for processing 33 containers.
2020-04-01 10:10:31,494 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor
Thread took 2 milliseconds for processing 33 containers.
2020-04-01 10:10:43,495 INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor
Thread took 1 milliseconds for processing 33 containers.
2020-04-01 10:10:47,987 ERROR
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes:
92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host:
quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack,
certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128,
host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation:
/default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip:
172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site,
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6,
CreationTimestamp2020-04-01T10:04:47.723688Z] from datanode
ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: 172.27.12.195, host:
quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack,
certSerialId: 12651664310640168}. Reason : ea2322d9-8ede-4f48-a72d-693e809d2b95
is in candidate state for 61616ms
2020-04-01 10:10:47,988 INFO
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: Destroying
pipeline:Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes:
92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host:
quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack,
certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128,
host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation:
/default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip:
172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site,
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6,
CreationTimestamp2020-04-01T10:04:47.723688Z]{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]