[ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014703#comment-16014703
 ] 

Kihwal Lee edited comment on HDFS-11817 at 5/18/17 8:24 PM:
------------------------------------------------------------

*Details of the NPE:*
The JVM did produce a stacktrace on the very first occurance of the NPE. 
Subsequent ones were missing a stack trace.

The NPE is caused by {{commitBlockSynchronization()}} containing a dead node in 
the new targets.  Since block recoveries are issued based on 
{{BlockUnderConstructionFeature.replicas}} (aka expected locations), which is 
not updated on node death, block recovery can include dead nodes.  When 
{{commitBlockSynchronization()}} is called, the expected locations is also 
updated. (In fact, the whole BlockUnderConstructionFeature is swapped)  Each 
expected location is populated by searching for datanode storage using the 
storage ID string passed in {{commitBlockSynchronization()}}.   If the node is 
dead, the look up returns null.

(Clarification on dead node: the faulty node did try to come back at times and 
that actually made the situation worse. On re-registration, the existing 
storages are removed from the datanode descriptor. If it cannot heatbeat for 
some reason, storage lookup using a storage ID will return null)

If {{getBlockLocation()}} is called after this, {{newLocatedBlock()}} is called 
with the expected locations, not with the locations in the blocks map, since it 
is still under-construction. This calls 
{{DatanodeStorageInfo.toDatanodeInfos()}}, which blows up, as it tries to call 
{{getDatanodeDescriptor()}} of the null storage object.

*Proposed solution to the NPE issue:*
We can have {{commitBlockSynchronization()}} check for valid storage ID before 
updating data structures.  Even if no valid storage ID is found, we can't fail 
the operation. One or more node did finalize the block, whether they are dead 
or alive at this moment. It is like a missing block case.  We can go ahead and 
commit the block without the dead node/storage and also allow closing of the 
file, just like {{completeFile()}}.

On closing of the file, {{checkReplication()}} is called and in our example, 
this will cause the last block (still in committed state) to be reported as 
missing.  If the dead node comes back, it will include the finalized replica in 
the block report and that will cause the block to be completed and missing 
block to be cleared.


was (Author: kihwal):
*Details of the NPE:*
The JVM did produce a stacktrace on the very first occurance of the NPE. 
Subsequent ones were missing a stack trace.

The NPE is caused by {{commitBlockSynchronization()}} containing a dead node in 
the new targets.  Since block recoveries are issued based on 
{{BlockUnderConstructionFeature.replicas}} (aka expected locations), which is 
not updated on node death, block recovery can include dead nodes.  When 
{{commitBlockSynchronization()}} is called, the expected locations is also 
updated. (In fact, the whole BlockUnderConstructionFeature is swapped)  Each 
expected location is populated by searching for datanode storage using the 
storage ID string passed in {{commitBlockSynchronization()}}.   If the node is 
dead, the look up returns null.

If {{getBlockLocation()}} is called after this, {{newLocatedBlock()}} is called 
with the expected locations, not with the locations in the blocks map, since it 
is still under-construction. This calls 
{{DatanodeStorageInfo.toDatanodeInfos()}}, which blows up, as it tries to call 
{{getDatanodeDescriptor()}} of the null storage object.

*Proposed solution to the NPE issue:*
We can have {{commitBlockSynchronization()}} check for valid storage ID before 
updating data structures.  Even if no valid storage ID is found, we can't fail 
the operation. One or more node did finalize the block, whether they are dead 
or alive at this moment. It is like a missing block case.  We can go ahead and 
commit the block without the dead node/storage and also allow closing of the 
file, just like {{completeFile()}}.

On closing of the file, {{checkReplication()}} is called and in our example, 
this will cause the last block (still in committed state) to be reported as 
missing.  If the dead node comes back, it will include the finalized replica in 
the block report and that will cause the block to be completed and missing 
block to be cleared.

> A faulty node can cause a lease leak and NPE on accessing data
> --------------------------------------------------------------
>
>                 Key: HDFS-11817
>                 URL: https://issues.apache.org/jira/browse/HDFS-11817
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to