zhengchenyu created HDFS-16070:
----------------------------------

             Summary: DataTransfer block storm when datanode's io is busy.
                 Key: HDFS-16070
                 URL: https://issues.apache.org/jira/browse/HDFS-16070
             Project: Hadoop HDFS
          Issue Type: Improvement
    Affects Versions: 3.2.1, 3.3.0
            Reporter: zhengchenyu


When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to