subject:"\"\\\[jira\\\] \\\[Commented\\\] \\\(HDFS\\\-15451\\\) Restarting name node stuck in safe mode when using provided storage\""

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

2020-07-06 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152470#comment-17152470
 ] 

Xiaoqiao He commented on HDFS-15451:


cherrypick to branch-3.3, branch-3.2 and branch-3.1.
Thanks [~shanyu].

> Restarting name node stuck in safe mode when using provided storage
> ---
>
> Key: HDFS-15451
> URL: https://issues.apache.org/jira/browse/HDFS-15451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

2020-07-06 Thread shanyu zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152251#comment-17152251
 ] 

shanyu zhao commented on HDFS-15451:


Thank you [~hexiaoqiao] and [~virajith]! Is it possible to also back port it to 
branch-3.1, branch-3.2 and branch-3.3?

> Restarting name node stuck in safe mode when using provided storage
> ---
>
> Key: HDFS-15451
> URL: https://issues.apache.org/jira/browse/HDFS-15451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1, 3.1.3
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 3.4.0
>
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

2020-07-06 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152124#comment-17152124
 ] 

Hudson commented on HDFS-15451:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18412 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18412/])
HDFS-15451. Do not discard non-initial block report for provided (github: rev 
834372f4040f1e7a00720da5c40407f9b1423b6d)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java


> Restarting name node stuck in safe mode when using provided storage
> ---
>
> Key: HDFS-15451
> URL: https://issues.apache.org/jira/browse/HDFS-15451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1, 3.1.3
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 3.4.0
>
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

2020-07-02 Thread Virajith Jalaparti (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150312#comment-17150312
 ] 

Virajith Jalaparti commented on HDFS-15451:
---

Thanks for finding/fixing this [~shanyu].

> Restarting name node stuck in safe mode when using provided storage
> ---
>
> Key: HDFS-15451
> URL: https://issues.apache.org/jira/browse/HDFS-15451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1, 3.1.3
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

2020-07-01 Thread shanyu zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149861#comment-17149861
 ] 

shanyu zhao commented on HDFS-15451:


Pull request submitted:
https://github.com/apache/hadoop/pull/2119

> Restarting name node stuck in safe mode when using provided storage
> ---
>
> Key: HDFS-15451
> URL: https://issues.apache.org/jira/browse/HDFS-15451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1, 3.1.3
>Reporter: shanyu zhao
>Priority: Major
>
> When HDFS provided storage is used (dfs.namenode.provided.enabled=true), 
> sometimes restarting name node will result in it stuck at safe mode.
> The problem is that data node send block report to name node successfully, 
> but name node is not processing the report properly, then HDFS remains in 
> safe mode due to missing blocks.
> Looking at name node log, this is the sequence of log for a specific data 
> node:
> {code}
> 2020-07-01 19:46:41,997 INFO blockmanagement.BlockReportLeaseManager: 
> Registered DN af19d9e0-7b9b-45e0-9aa6-b2f404098084 (10.244.6.131:9866).
> 2020-07-01 19:46:42,012 DEBUG blockmanagement.BlockReportLeaseManager: 
> Created a new BR lease 0x476aaae689ebbc01 for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084.  numPending = 4
> 2020-07-01 19:46:42,340 INFO BlockStateChange: BLOCK* processReport 
> 0xcc610f42d0218cd9: discarded non-initial block report from 
> DatanodeRegistration(10.244.6.131:9866, 
> datanodeUuid=af19d9e0-7b9b-45e0-9aa6-b2f404098084, infoPort=0, 
> infoSecurePort=9865, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-f49d3421-e04f-40b9-89ef-cf4fee73ad6a;nsid=497894240;c=1572548424451)
>  because namenode still in startup phase
> 2020-07-01 19:46:42,648 WARN blockmanagement.BlockReportLeaseManager: BR 
> lease 0x476aaae689ebbc01 is not valid for DN 
> af19d9e0-7b9b-45e0-9aa6-b2f404098084, because the DN is not in the pending 
> set.
> {code}
> The root cause is when BlockManager is processing report, it will skip 
> processing when storageInfo.getBlockReportCount() > 0 and remove the lease:
> {code}
> blockReportLeaseManager.removeLease(node)
> {code}
> This is because every data node will report a DS-PROVIDED storage, along with 
> other storages (like DISK storage). All DS -PROVIDED storages are actually 
> pointing to the same storageInfo, therefore the second data node sending 
> block report with DS-PROVIDED will have blockReportCount > 0. Then the lease 
> is removed for the data node, then processing future block reports from this 
> node will fail at checkLease() with message "BR lease is not valid".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

[jira] [Commented] (HDFS-15451) Restarting name node stuck in safe mode when using provided storage

5 matches

Site Navigation

Mail list logo

Footer information