[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635400#comment-17635400 ] caozhiqiang commented on HDFS-16613: [~tasanuma] , OK, I have create an issue [HDFS-16846|https://issues.apache.org/jira/browse/HDFS-16846]. Please help to review. > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635068#comment-17635068 ] Takanobu Asanuma commented on HDFS-16613: - Thanks for your reply and the suggestion, [~caozhiqiang]. Given that the speed is not much improved in the case of replication, I prefer that this setting should only affect ec blocks. Could you please create another issue addressing it? > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634860#comment-17634860 ] caozhiqiang commented on HDFS-16613: [~tasanuma], Yes, with this change, the dfs.namenode.replication.max-streams-hard-limit configuration will only affect decommissioning DataNode, but will not distinguish between replication blocks and EC blocks. If you consider this configuration should only effect EC blocks, we can change code in DatanodeManager like below: {code:java} int maxReplicaTransfers = blockManager.getMaxReplicationStreams() - xmitsInProgress;; int maxEcTransfers; if (nodeinfo.isDecommissionInProgress()) { maxEcTransfers = blockManager.getReplicationStreamsHardLimit() - xmitsInProgress; } else { maxEcTransfers = blockManager.getMaxReplicationStreams() - xmitsInProgress; } int numReplicationTasks = (int) Math.ceil( (double) (totalReplicateBlocks * maxReplicaTransfers) / totalBlocks); int numECTasks = (int) Math.ceil( (double) (totalECBlocks * maxEcTransfers) / totalBlocks); {code} > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634749#comment-17634749 ] Takanobu Asanuma commented on HDFS-16613: - We found that this change and increasing `dfs.namenode.replication.max-streams-hard-limit` have a side effect. Since it affects not only ec files but also replication files, even if DataNodes have only replication files, they will always generate high network traffic during decommissioning even though the decommissioning speed is not much improved. (Although decommissioning DataNodes with many replication files is already fast enough.) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553020#comment-17553020 ] caozhiqiang commented on HDFS-16613: [~hadachi] , thank you. Could you help to review this PR [GitHub Pull Request #4398|https://github.com/apache/hadoop/pull/4398] if this approach works? > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552722#comment-17552722 ] Hiroyuki Adachi commented on HDFS-16613: [~caozhiqiang] , thank you for explaining in detail. I think the data process you described is correct and your approach for improving performance is right. My concern was the reconstruction load on a large cluster where blocksToProcess is much larger than maxTransfers. But I found that I had misunderstood that the blocks held by the busy node would be reconstructed rather than replicated. So I think there is no problem using dfs.namenode.replication.max-streams-hard-limit for this purpose. > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551381#comment-17551381 ] caozhiqiang commented on HDFS-16613: [~hadachi] , in my cluster, dfs.namenode.replication.max-streams-hard-limit=512, dfs.namenode.replication.work.multiplier.per.iteration=20. The data process is below: # Choose the blocks to be reconstructed from neededReconstruction. This process use dfs.namenode.replication.work.multiplier.per.iteration to limit process number. # *Choose source datanode. This process use dfs.namenode.replication.max-streams-hard-limit to limit process number.* # Choose target datanode. # Add task to datanode. # The blocks to be replicated would put to pendingReconstruction. If blocks in pendingReconstruction timeout, they will be put back to neededReconstruction and continue process. *This process use dfs.namenode.reconstruction.pending.timeout-sec to limit time interval.* # *Send cmd to dn in heartbeat response. Use dfs.namenode.decommission.max-streams to limit task number original.* Firstly, the process 1 doesn't have performance bottleneck. Performance bottleneck is in process 2, 5 and 6. So we should increase the value of dfs.namenode.replication.max-streams-hard-limit and decrease the value of dfs.namenode.reconstruction.pending.timeout-sec{*}.{*} With process 6, we should use dfs.namenode.replication.max-streams-hard-limit to limit the task number. {code:java} // DatanodeManager::handleHeartbeat if (nodeinfo.isDecommissionInProgress()) { maxTransfers = blockManager.getReplicationStreamsHardLimit() - xmitsInProgress; } else { maxTransfers = blockManager.getMaxReplicationStreams() - xmitsInProgress; } {code} The below graph with under replicated blocks and pending replicated blocks metrics monitor, which can show the performance bottleneck. A lot of blocks time out in pendingReconstruction and were put back to neededReconstruction repeatedly. The first graph is before optimization and the second is after optimization. Please help to check this process, thank you. !image-2022-06-08-11-41-11-127.png|width=932,height=190! !image-2022-06-08-11-38-29-664.png|width=931,height=175! > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550940#comment-17550940 ] Hiroyuki Adachi commented on HDFS-16613: [~caozhiqiang] , thank you for your explanation. It looks good. Now I understand that the blocksToProcess controls the number of replication works, so if it is less than dfs.namenode.replication.max-streams-hard-limit, all blocks use replication on decommissioning node but not reconstruction. Could you please tell me the value of dfs.namenode.replication.max-streams-hard-limit and dfs.namenode.replication.work.multiplier.per.iteration? {code:java} // BlockManager#computeDatanodeWork final int blocksToProcess = numlive * this.blocksReplWorkMultiplier; final int nodesToProcess = (int) Math.ceil(numlive * this.blocksInvalidateWorkPct); int workFound = this.computeBlockReconstructionWork(blocksToProcess); {code} > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550907#comment-17550907 ] caozhiqiang commented on HDFS-16613: [~hadachi] , thank you for your review. Firstly, my hadoop branch has included HDFS-14768. In my test, even the decommissioning node is made busy, ec blocks will not be reconstructed. It would not send ec task to datanode and only be reserved in BlockManager::pendingReconstruction. After timeout, these blocks will be put back to BlockManager::neededReconstruction and be rescheduled next time. So all blocks use replication on decommissioning node but not reconstruction. By the way, I decommission only one dn at a time. Secondly, there are 12 datanodes in my cluster, and each dn has 12 disks. There are 27217 ec block groups in my cluster and about 2 blocks in one datanode. Other nodes' load are very low beside the decommissioning node, include load average, cpu iowait and network. !image-2022-06-07-17-55-40-203.png|width=772,height=192! !image-2022-06-07-17-45-45-316.png|width=772,height=198! !image-2022-06-07-17-51-04-876.png|width=769,height=256! > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550845#comment-17550845 ] Hiroyuki Adachi commented on HDFS-16613: Thank you for your share. How many EC blocks does the decommissioning datanode have and how many datanodes in your cluster? I'm also interested in the load (network traffic, disk I/O, etc.) of the other datanodes while decommissioning. As I mentioned above, I think the other datanodes' loads were higher due to the reconstruction tasks. Was there any impact? > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550782#comment-17550782 ] caozhiqiang commented on HDFS-16613: In my cluster tests, the following optimizations would maximize the IO performance of the decommissioning DN. And the time spend by decommissioning a DN reduced from 3 hours to half an hour. # Add this patch # Increase the value of dfs.namenode.replication.max-streams-hard-limit # Decrease the value of dfs.namenode.reconstruction.pending.timeout-sec to shorten the time interval for checking pendingReconstructions. > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550779#comment-17550779 ] Hiroyuki Adachi commented on HDFS-16613: Hi [~caozhiqiang] , Using dfs.namenode.replication.max-streams-hard-limit is simple, but in my understanding, it makes the decommissioning node busy, and most of the EC blocks will not be replicated but reconstructed (see HDFS-14768). Since reconstruction is expensive, HDFS-8786 makes using replication for EC blocks on decommissioning node. Some people may prefer this. What do you think of this? > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545904#comment-17545904 ] caozhiqiang commented on HDFS-16613: [~tasanuma] [~hadachi] , besides add a new configuration to limit decommissioning dn separately, we also can use dfs.namenode.replication.max-streams-hard-limit to impelements the same purpose. We only need to modify DatanodeManager::handleHeartbeat() and use dfs.namenode.replication.max-streams-hard-limit to give numReplicationTasks to decommissioning dn. I will create a new pr, please help to review. {code:java} int maxTransfers; if (nodeinfo.isDecommissionInProgress()) { maxTransfers = blockManager.getReplicationStreamsHardLimit() - xmitsInProgress; } else { maxTransfers = blockManager.getMaxReplicationStreams() - xmitsInProgress; } {code} > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544658#comment-17544658 ] Takanobu Asanuma commented on HDFS-16613: - Nice catch. We have also seen this problem. CC: [~hadachi] > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org