[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199910#comment-17199910
 ] 

zhengchenyu commented on HDFS-15589:


[~hexiaoqiao]
Yes, in theroy, postponedMisreplicatedBlocks only compat fuction 
'rescanPostponedMisreplicatedBlocks', and it use namesystem's writeLock, then 
may decrease namnode rpc performance. But 
dfs.namenode.blocks.per.postponedblocks.rescan’s default value is 1, so I 
think it may result to little performance.
But let us see some log, some called wast long time.
{code}
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:15,429 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:18,496 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 64 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:23,958 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 2459 msecs. 19916 blocks 
are left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:27,023 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 60 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:30,088 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 61 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:33,149 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 58 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:47,890 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 5140 msecs. 19916 blocks 
are left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:36,458 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 110 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:39,529 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 70 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:42,596 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 66 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:45,665 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are 
left. 0 blocks were removed.
{code}
In fact, it found in our test cluster, a very small cluster, can't detect 
performace. But why I pay attention to this problem? My last comanpy, some day 
postponedMisreplicatedBlocks increase huge, then namenode rpc performane 
decrease. Then some hours laster, postponedMisreplicatedBlocks decrease, the 
namenode be well again. At that moment, I focus on yarn, so I didn't research 
the namenode log, and then no real truth. 

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* 

[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199870#comment-17199870
 ] 

Xiaoqiao He commented on HDFS-15589:


Thanks [~zhengchenyu] for your report. Just wonder if any impact to NameNode 
when PMB(abbr. `PostponedMisreplicatedBlocks`) keeps large number for long 
time? The largest number of PMB near to 100M in my practice, and I do not meet 
any performance issue with my inner branch. Any issues do you meet? Thanks.

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847
 ] 

zhengchenyu commented on HDFS-15589:


Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical? For example 
make sure new block report which trigger by namenode's heartbeat happened after 
enter active state. 

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-21 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199786#comment-17199786
 ] 

Ayush Saxena commented on HDFS-15589:
-

Yeps, that is true. For this only there was a proposal earlier, that block 
reports can be triggered after failover, but that could't reach conclusion, 
Since the number of datanodes in actual production will be quite high, and it 
could increase load on Namenode.

If you are facing this trouble, you can trigger Block report explicitly using 
{{dfsadmin}}
or do you propose any solution to this?

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-21 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199750#comment-17199750
 ] 

zhengchenyu commented on HDFS-15589:


[~ayushtkn] I know the postpone block's logical. I encounter a case, maybe a 
low probability case. Now we describe this logical simply:

(1) When namenode transient from standby to active, namenode will label all 
DatanodeDescriptor be stale for aviod to delete some possible deleted block.

(2) Then datanode blockreport to namenode, then set DatanodeDescriptor to not 
stale. Then some over replicate block could be delete.

But if (2) happend before (1), the DatanodeDescriptor will keep stale util next 
blockreport, you know blockreport is low frequency rpc operaiton. So 
PostponedMisreplicatedBlocks will keep huge number for long time.

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-21 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199432#comment-17199432
 ] 

Ayush Saxena commented on HDFS-15589:
-

Do you mean to say that the blocks doesn't get deleted post a namenode 
restart/failover, then yes, it won't happen until the  BR is received post DN's 
are marked stale. That is by design, the datanodes are marked stale, after the 
NN takes the active state, so as to prevent deletion of blocks, They are 
unmarked post the BR is received. 


> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org