[jira] [Updated] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
[ https://issues.apache.org/jira/browse/HDFS-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-12361: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) [~cheersyang] Thanks for the contribution. I have committed this to the feature branch. > Ozone: SCM failed to start when a container metadata is empty > - > > Key: HDFS-12361 > URL: https://issues.apache.org/jira/browse/HDFS-12361 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12361-HDFS-7240.001.patch > > > When I run tests to create keys via corona, sometimes it left some containers > with empty metadata. This might also happen when SCM stopped at some point > that metadata was not yet written. When this happens, we got following error > and SCM could not be started > {noformat} > 17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool > Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid > 7ee16a59-9604-406e-a0f8-6f44650a725b) service to > ozone1.fyre.ibm.com/172.16.165.133:8111 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) > at java.lang.Thread.run(Thread.java:745) > {noformat} > We should add a NPE check and mark such containers as inactive without > failing the SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
[ https://issues.apache.org/jira/browse/HDFS-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12361: --- Status: Patch Available (was: Open) > Ozone: SCM failed to start when a container metadata is empty > - > > Key: HDFS-12361 > URL: https://issues.apache.org/jira/browse/HDFS-12361 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12361-HDFS-7240.001.patch > > > When I run tests to create keys via corona, sometimes it left some containers > with empty metadata. This might also happen when SCM stopped at some point > that metadata was not yet written. When this happens, we got following error > and SCM could not be started > {noformat} > 17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool > Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid > 7ee16a59-9604-406e-a0f8-6f44650a725b) service to > ozone1.fyre.ibm.com/172.16.165.133:8111 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) > at java.lang.Thread.run(Thread.java:745) > {noformat} > We should add a NPE check and mark such containers as inactive without > failing the SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
[ https://issues.apache.org/jira/browse/HDFS-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12361: --- Attachment: HDFS-12361-HDFS-7240.001.patch Attach a simple patch to fix this, this adds a check and ensures such containers are marked as INACTIVE in SCM. Please kindly review. Thanks. > Ozone: SCM failed to start when a container metadata is empty > - > > Key: HDFS-12361 > URL: https://issues.apache.org/jira/browse/HDFS-12361 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12361-HDFS-7240.001.patch > > > When I run tests to create keys via corona, sometimes it left some containers > with empty metadata. This might also happen when SCM stopped at some point > that metadata was not yet written. When this happens, we got following error > and SCM could not be started > {noformat} > 17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool > Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid > 7ee16a59-9604-406e-a0f8-6f44650a725b) service to > ozone1.fyre.ibm.com/172.16.165.133:8111 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) > at java.lang.Thread.run(Thread.java:745) > {noformat} > We should add a NPE check and mark such containers as inactive without > failing the SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org