[ https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014703#comment-16014703 ]
Kihwal Lee edited comment on HDFS-11817 at 5/18/17 8:24 PM: ------------------------------------------------------------ *Details of the NPE:* The JVM did produce a stacktrace on the very first occurance of the NPE. Subsequent ones were missing a stack trace. The NPE is caused by {{commitBlockSynchronization()}} containing a dead node in the new targets. Since block recoveries are issued based on {{BlockUnderConstructionFeature.replicas}} (aka expected locations), which is not updated on node death, block recovery can include dead nodes. When {{commitBlockSynchronization()}} is called, the expected locations is also updated. (In fact, the whole BlockUnderConstructionFeature is swapped) Each expected location is populated by searching for datanode storage using the storage ID string passed in {{commitBlockSynchronization()}}. If the node is dead, the look up returns null. (Clarification on dead node: the faulty node did try to come back at times and that actually made the situation worse. On re-registration, the existing storages are removed from the datanode descriptor. If it cannot heatbeat for some reason, storage lookup using a storage ID will return null) If {{getBlockLocation()}} is called after this, {{newLocatedBlock()}} is called with the expected locations, not with the locations in the blocks map, since it is still under-construction. This calls {{DatanodeStorageInfo.toDatanodeInfos()}}, which blows up, as it tries to call {{getDatanodeDescriptor()}} of the null storage object. *Proposed solution to the NPE issue:* We can have {{commitBlockSynchronization()}} check for valid storage ID before updating data structures. Even if no valid storage ID is found, we can't fail the operation. One or more node did finalize the block, whether they are dead or alive at this moment. It is like a missing block case. We can go ahead and commit the block without the dead node/storage and also allow closing of the file, just like {{completeFile()}}. On closing of the file, {{checkReplication()}} is called and in our example, this will cause the last block (still in committed state) to be reported as missing. If the dead node comes back, it will include the finalized replica in the block report and that will cause the block to be completed and missing block to be cleared. was (Author: kihwal): *Details of the NPE:* The JVM did produce a stacktrace on the very first occurance of the NPE. Subsequent ones were missing a stack trace. The NPE is caused by {{commitBlockSynchronization()}} containing a dead node in the new targets. Since block recoveries are issued based on {{BlockUnderConstructionFeature.replicas}} (aka expected locations), which is not updated on node death, block recovery can include dead nodes. When {{commitBlockSynchronization()}} is called, the expected locations is also updated. (In fact, the whole BlockUnderConstructionFeature is swapped) Each expected location is populated by searching for datanode storage using the storage ID string passed in {{commitBlockSynchronization()}}. If the node is dead, the look up returns null. If {{getBlockLocation()}} is called after this, {{newLocatedBlock()}} is called with the expected locations, not with the locations in the blocks map, since it is still under-construction. This calls {{DatanodeStorageInfo.toDatanodeInfos()}}, which blows up, as it tries to call {{getDatanodeDescriptor()}} of the null storage object. *Proposed solution to the NPE issue:* We can have {{commitBlockSynchronization()}} check for valid storage ID before updating data structures. Even if no valid storage ID is found, we can't fail the operation. One or more node did finalize the block, whether they are dead or alive at this moment. It is like a missing block case. We can go ahead and commit the block without the dead node/storage and also allow closing of the file, just like {{completeFile()}}. On closing of the file, {{checkReplication()}} is called and in our example, this will cause the last block (still in committed state) to be reported as missing. If the dead node comes back, it will include the finalized replica in the block report and that will cause the block to be completed and missing block to be cleared. > A faulty node can cause a lease leak and NPE on accessing data > -------------------------------------------------------------- > > Key: HDFS-11817 > URL: https://issues.apache.org/jira/browse/HDFS-11817 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: hdfs-11817_supplement.txt > > > When the namenode performs a lease recovery for a failed write, the > {{commitBlockSynchronization()}} will fail, if none of the new target has > sent a received-IBR. At this point, the data is inaccessible, as the > namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}. > The lease recovery will be retried in about an hour by the namenode. If the > nodes are faulty (usually when there is only one new target), they may not > block report until this point. If this happens, lease recovery throws an > {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove > the lease without finalizing the inode. > This results in an inconsistent lease state. The inode stays > under-construction, but no more lease recovery is attempted. A manual lease > recovery is also not allowed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org