[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893930#comment-15893930 ] Daniel Ochoa commented on HDFS-7787: I'm having the same issue (decommission process is taking too long), blocks with no live replicas except on nodes that decommissioning should have highest priority. The code is now here: https://github.com/apache/hadoop/blob/b61fb267b92b2736920b4bd0c673d31e7632ebb9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java > Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority > to blocks on nodes being decomissioned > -- > > Key: HDFS-7787 > URL: https://issues.apache.org/jira/browse/HDFS-7787 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 > Environment: 2 namenodes HA, 6 datanodes in two racks >Reporter: Frode Halvorsen > Labels: balance, hdfs, replication-performance > > Each file has a setting of 3 replicas. split on different racks. > After a simulated crash of one rack (shutdown of all nodes, deleted > data-directory an started nodes) and decommssion of one of the nodes in the > orther rack the replication does not follow 'normal' rules... > My cluster has appx 25 mill files, and the one node I now try to decommision > has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live > replicas'. After a restart of the node, it starts to replicate both types of > blocks, but after a while, it only repliates under-replicated blocks with > other live copies. I would think that the 'normal' way to do this would be to > make sure that all blocks this node keeps the only copy of, should be the > first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321968#comment-14321968 ] Frode Halvorsen commented on HDFS-7787: --- Sorry- my grep was wrong, and included a lot of replications for earlier times, but it was still on the same decom-node. The correct stats for the 10 minutes between 13:00 and 13.10 today is : a total of 3161 started threads. None of thos was for blocks with two live replicas, but 2430 was for blocks with one live replica and only 731 was blocks without live replicas. That means that only 1/4 of the blocks replicated was of the 'highest priority'. And of course this made my day worse ; I now hae to wait one month befor I can take down the node... Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned -- Key: HDFS-7787 URL: https://issues.apache.org/jira/browse/HDFS-7787 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Environment: 2 namenodes HA, 6 datanodes in two racks Reporter: Frode Halvorsen Labels: balance, hdfs, replication-performance Each file has a setting of 3 replicas. split on different racks. After a simulated crash of one rack (shutdown of all nodes, deleted data-directory an started nodes) and decommssion of one of the nodes in the orther rack the replication does not follow 'normal' rules... My cluster has appx 25 mill files, and the one node I now try to decommision has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live replicas'. After a restart of the node, it starts to replicate both types of blocks, but after a while, it only repliates under-replicated blocks with other live copies. I would think that the 'normal' way to do this would be to make sure that all blocks this node keeps the only copy of, should be the first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321963#comment-14321963 ] Frode Halvorsen commented on HDFS-7787: --- I just did a log-analysis of the decommissioning node, and looked at what it actually started to replicate during av ten-minute period. I filtered on the log-lines for 'Staring thread to transefer' and counted lines divided into replication to one, two or three nodes (blocks with 2, 1 and 0 live replicas). It started 5036 threads during the 10 minutes I loked at, and it was : 53 blokcs to one node (2 live replicas in the cluster) 3127 blocks to two nodes (blocks with one live replica) 1856 blocks to three nodes (blocks with no live replicas) Of course this is a problem for me, as I won't be able to kill the node totally before all blocks with no live replicas has been transfered. It's still 3.3 million of them, and at this rate I won't be able to kille the node for another week and a half :( Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned -- Key: HDFS-7787 URL: https://issues.apache.org/jira/browse/HDFS-7787 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Environment: 2 namenodes HA, 6 datanodes in two racks Reporter: Frode Halvorsen Labels: balance, hdfs, replication-performance Each file has a setting of 3 replicas. split on different racks. After a simulated crash of one rack (shutdown of all nodes, deleted data-directory an started nodes) and decommssion of one of the nodes in the orther rack the replication does not follow 'normal' rules... My cluster has appx 25 mill files, and the one node I now try to decommision has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live replicas'. After a restart of the node, it starts to replicate both types of blocks, but after a while, it only repliates under-replicated blocks with other live copies. I would think that the 'normal' way to do this would be to make sure that all blocks this node keeps the only copy of, should be the first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322120#comment-14322120 ] Frode Halvorsen commented on HDFS-7787: --- And later, it seems to get better :) Now (after another paramater-tuning for faster replication) it replicates 43.000 blocks / hour. And every block is one that has zero live replicas in the cluster :) It actually seems that the name-nodes needs time to calculate which blocks has higher priority. Now I only need three more days before I can take down the data-node :) As it turns out, it might just be parameteres that made me believe that it had a bad prioritizing-algorithm :) Too bad a lot of the parameters I now have changed is undocumented, but 'revealed' in different forum-postings... A quick look at logs on the active name-node reveals that it actually only ask the decommissioning node to replicate. No other nodes is contacted, thus it now only replicates nodes with no live replicas. It might be my parameter-settings, but it could actually have asked any of the other 5 datanodes to replicate the blocks with one live replica... I'll try to add even more replication-requests per heartbeat to see if it is able to make the other datanodes do any work as well. Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned -- Key: HDFS-7787 URL: https://issues.apache.org/jira/browse/HDFS-7787 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Environment: 2 namenodes HA, 6 datanodes in two racks Reporter: Frode Halvorsen Labels: balance, hdfs, replication-performance Each file has a setting of 3 replicas. split on different racks. After a simulated crash of one rack (shutdown of all nodes, deleted data-directory an started nodes) and decommssion of one of the nodes in the orther rack the replication does not follow 'normal' rules... My cluster has appx 25 mill files, and the one node I now try to decommision has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live replicas'. After a restart of the node, it starts to replicate both types of blocks, but after a while, it only repliates under-replicated blocks with other live copies. I would think that the 'normal' way to do this would be to make sure that all blocks this node keeps the only copy of, should be the first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321709#comment-14321709 ] Frode Halvorsen commented on HDFS-7787: --- Hello. This was some time ago, and it might be that I didn't have any decommissioning nodes when I was observing that the namenode didn't prioritize the blocks with only one replica first. When i look in the logs now, the namenode asks the decommissioning node to replicate every block to three other nodes, thus I believe it only get replication-requests for blocks with no live replicas. This leaves me with the struggle to speed up the process :) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned -- Key: HDFS-7787 URL: https://issues.apache.org/jira/browse/HDFS-7787 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Environment: 2 namenodes HA, 6 datanodes in two racks Reporter: Frode Halvorsen Labels: balance, hdfs, replication-performance Each file has a setting of 3 replicas. split on different racks. After a simulated crash of one rack (shutdown of all nodes, deleted data-directory an started nodes) and decommssion of one of the nodes in the orther rack the replication does not follow 'normal' rules... My cluster has appx 25 mill files, and the one node I now try to decommision has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live replicas'. After a restart of the node, it starts to replicate both types of blocks, but after a while, it only repliates under-replicated blocks with other live copies. I would think that the 'normal' way to do this would be to make sure that all blocks this node keeps the only copy of, should be the first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned
[ https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321728#comment-14321728 ] Frode Halvorsen commented on HDFS-7787: --- I now changed parameters again in order to speed up replicatuon, and now I see that the decommissioning node is told to replicate both to two and three other nodes. Actually most of the requests is to replicate only two copies, so I suspect that the blocks it's asked to replicate does have live replicas in the cluster. Appx 1/5 of the replication requests is for three nodes: 2015-02-14 23:00:16,008 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839116_24099285 to datanode(s) x.x.x.206:50010 x.x.x.207:50010 x.x.x.209:50010 2015-02-14 23:00:16,009 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839119_24099288 to datanode(s) x.x.x.206:50010 x.x.x.205:50010 x.x.x.209:50010 2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839113_24099282 to datanode(s) x.x.x.204:50010 x.x.x.205:50010 2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839114_24099283 to datanode(s) x.x.x.204:50010 x.x.x.209:50010 2015-02-14 23:00:16,011 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839166_24099335 to datanode(s) x.x.x.204:50010 x.x.x.207:50010 2015-02-14 23:00:16,012 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839162_24099331 to datanode(s) x.x.x.206:50010 x.x.x.205:50010 x.x.x.209:50010 2015-02-14 23:00:20,046 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839309_24099478 to datanode(s) x.x.x.206:50010 x.x.x.205:50010 2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839310_24099479 to datanode(s) x.x.x.204:50010 x.x.x.205:50010 2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839352_24099521 to datanode(s) x.x.x.206:50010 x.x.x.209:50010 2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839359_24099528 to datanode(s) x.x.x.206:50010 x.x.x.209:50010 2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839358_24099527 to datanode(s) x.x.x.204:50010 x.x.x.209:50010 2015-02-14 23:00:20,049 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839357_24099526 to datanode(s) x.x.x.206:50010 x.x.x.207:50010 2015-02-14 23:00:22,056 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839241_24099410 to datanode(s) x.x.x.206:50010 x.x.x.205:50010 2015-02-14 23:00:22,057 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to replicate blk_1097839242_24099411 to datanode(s) x.x.x.204:50010 x.x.x.209:50010 x.x.x.205:50010 The node at 208 is decommissioning and i would say that this proves that the node is asked to replicate blocks that have live replicas as well as blocks with no live replicas. I haven't looked at the code, but it's wrong for me to have the decommissioning node replicate other blocks than those without live replicas. Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned -- Key: HDFS-7787 URL: https://issues.apache.org/jira/browse/HDFS-7787 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Environment: 2 namenodes HA, 6 datanodes in two racks Reporter: Frode Halvorsen Labels: balance, hdfs, replication-performance Each file has a setting of 3 replicas. split on different racks. After a simulated crash of one rack (shutdown of all nodes, deleted data-directory an started nodes) and decommssion of one of the nodes in the orther rack the replication does not follow 'normal' rules... My cluster has appx 25 mill files, and the one node I now try to decommision has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live replicas'. After a restart of the node, it starts to replicate both types of blocks, but after a while, it only repliates under-replicated blocks with other live copies. I would think that the 'normal' way to do this would be to make sure that all blocks this node keeps the only copy of, should be the first to be replicated/balanced ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)