[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti reassigned HDFS-15451:
-----------------------------------------

    Assignee: shanyu zhao

> Restarting name node stuck in safe mode when using provided storage
> -------------------------------------------------------------------
>
>                 Key: HDFS-15451
>                 URL: https://issues.apache.org/jira/browse/HDFS-15451
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.2.1, 3.1.3
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>            Priority: Major
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to