[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2017-03-03 Thread Daniel Ochoa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893930#comment-15893930
 ] 

Daniel Ochoa commented on HDFS-7787:


I'm having the same issue (decommission process is taking too long), blocks 
with no live replicas except on nodes that decommissioning should have highest 
priority.

The code is now here:
https://github.com/apache/hadoop/blob/b61fb267b92b2736920b4bd0c673d31e7632ebb9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java

> Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
> to blocks on nodes being decomissioned
> --
>
> Key: HDFS-7787
> URL: https://issues.apache.org/jira/browse/HDFS-7787
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: 2 namenodes HA, 6 datanodes in two racks
>Reporter: Frode Halvorsen
>  Labels: balance, hdfs, replication-performance
>
> Each file has a setting of 3 replicas. split on different racks.
> After a simulated crash of one rack (shutdown of all nodes, deleted 
> data-directory an started nodes) and decommssion of one of the nodes in the 
> orther rack the replication does not follow 'normal' rules...
> My cluster has appx 25 mill files, and the one node I now try to decommision 
> has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
> replicas'. After a restart of the node, it starts to replicate both types of 
> blocks, but after a while, it only repliates under-replicated blocks with 
> other live copies. I would think that the 'normal' way to do this would be to 
> make sure that all blocks this node keeps the only copy of, should be the 
> first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2015-02-15 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321968#comment-14321968
 ] 

Frode Halvorsen commented on HDFS-7787:
---

Sorry- my grep was wrong, and included a lot of replications for earlier times, 
but it was still on the same decom-node.

The correct stats for the 10 minutes between 13:00 and 13.10 today is :
a total of 3161 started threads. None of thos was for blocks with two live 
replicas, but 2430 was for blocks with one live replica and only 731 was blocks 
without live replicas.
That means that only 1/4 of the blocks replicated was of the 'highest 
priority'. And of course this made my day worse ; I now hae to wait one month 
befor I can take down the node... 


 Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
 to blocks on nodes being decomissioned
 --

 Key: HDFS-7787
 URL: https://issues.apache.org/jira/browse/HDFS-7787
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
 Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
  Labels: balance, hdfs, replication-performance

 Each file has a setting of 3 replicas. split on different racks.
 After a simulated crash of one rack (shutdown of all nodes, deleted 
 data-directory an started nodes) and decommssion of one of the nodes in the 
 orther rack the replication does not follow 'normal' rules...
 My cluster has appx 25 mill files, and the one node I now try to decommision 
 has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
 replicas'. After a restart of the node, it starts to replicate both types of 
 blocks, but after a while, it only repliates under-replicated blocks with 
 other live copies. I would think that the 'normal' way to do this would be to 
 make sure that all blocks this node keeps the only copy of, should be the 
 first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2015-02-15 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321963#comment-14321963
 ] 

Frode Halvorsen commented on HDFS-7787:
---

I just did a log-analysis of the decommissioning  node, and looked at what it 
actually started to replicate during av ten-minute period. I filtered on the 
log-lines for 'Staring thread to transefer' and counted lines divided into 
replication to one, two or three nodes (blocks with 2, 1 and 0 live replicas). 
It started 5036 threads during the 10 minutes I loked at, and it was :
53 blokcs to one node (2 live replicas in the cluster)
3127 blocks to two nodes (blocks with one live replica)
1856 blocks to three nodes (blocks with no live replicas)


Of course this is a problem for me, as I won't be able to kill the node totally 
before all blocks with no live replicas has been transfered. It's still 3.3 
million of them, and at this rate I won't be able to kille the node for another 
week and a half :(


 Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
 to blocks on nodes being decomissioned
 --

 Key: HDFS-7787
 URL: https://issues.apache.org/jira/browse/HDFS-7787
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
 Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
  Labels: balance, hdfs, replication-performance

 Each file has a setting of 3 replicas. split on different racks.
 After a simulated crash of one rack (shutdown of all nodes, deleted 
 data-directory an started nodes) and decommssion of one of the nodes in the 
 orther rack the replication does not follow 'normal' rules...
 My cluster has appx 25 mill files, and the one node I now try to decommision 
 has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
 replicas'. After a restart of the node, it starts to replicate both types of 
 blocks, but after a while, it only repliates under-replicated blocks with 
 other live copies. I would think that the 'normal' way to do this would be to 
 make sure that all blocks this node keeps the only copy of, should be the 
 first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2015-02-15 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322120#comment-14322120
 ] 

Frode Halvorsen commented on HDFS-7787:
---

And later, it seems to get better :)  Now (after another paramater-tuning for 
faster replication) it replicates 43.000 blocks / hour. And every block  is one 
that has zero live replicas in the cluster :) It actually seems that the 
name-nodes needs time to calculate which blocks has higher priority. Now I only 
need three more days before I can take down the data-node :)

As it turns out, it might just be parameteres that made me believe that it had 
a bad prioritizing-algorithm :) Too bad a lot of the parameters I now have 
changed is undocumented, but 'revealed' in different forum-postings...

A quick look at logs on the active name-node reveals that it actually only ask 
the decommissioning node to replicate. No other nodes is contacted, thus it now 
only replicates nodes with no live replicas. It might be my parameter-settings, 
but it could actually have asked any of the other 5 datanodes to replicate the 
blocks with one live replica... I'll try to add even more replication-requests 
per heartbeat to see if it is able to make the other datanodes do any work as 
well. 

 Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
 to blocks on nodes being decomissioned
 --

 Key: HDFS-7787
 URL: https://issues.apache.org/jira/browse/HDFS-7787
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
 Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
  Labels: balance, hdfs, replication-performance

 Each file has a setting of 3 replicas. split on different racks.
 After a simulated crash of one rack (shutdown of all nodes, deleted 
 data-directory an started nodes) and decommssion of one of the nodes in the 
 orther rack the replication does not follow 'normal' rules...
 My cluster has appx 25 mill files, and the one node I now try to decommision 
 has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
 replicas'. After a restart of the node, it starts to replicate both types of 
 blocks, but after a while, it only repliates under-replicated blocks with 
 other live copies. I would think that the 'normal' way to do this would be to 
 make sure that all blocks this node keeps the only copy of, should be the 
 first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2015-02-14 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321709#comment-14321709
 ] 

Frode Halvorsen commented on HDFS-7787:
---

Hello.

This was some time ago, and it might be that I didn't have any decommissioning 
nodes when I was observing that the namenode didn't prioritize the blocks with 
only one replica first. When i look in the logs now, the namenode asks the 
decommissioning node to replicate every block to three other nodes, thus I 
believe it only get replication-requests for blocks with no live replicas.

This leaves me with the struggle to speed up the process :)
 

 Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
 to blocks on nodes being decomissioned
 --

 Key: HDFS-7787
 URL: https://issues.apache.org/jira/browse/HDFS-7787
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
 Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
  Labels: balance, hdfs, replication-performance

 Each file has a setting of 3 replicas. split on different racks.
 After a simulated crash of one rack (shutdown of all nodes, deleted 
 data-directory an started nodes) and decommssion of one of the nodes in the 
 orther rack the replication does not follow 'normal' rules...
 My cluster has appx 25 mill files, and the one node I now try to decommision 
 has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
 replicas'. After a restart of the node, it starts to replicate both types of 
 blocks, but after a while, it only repliates under-replicated blocks with 
 other live copies. I would think that the 'normal' way to do this would be to 
 make sure that all blocks this node keeps the only copy of, should be the 
 first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7787) Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority to blocks on nodes being decomissioned

2015-02-14 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321728#comment-14321728
 ] 

Frode Halvorsen commented on HDFS-7787:
---

I now changed parameters again in order to speed up replicatuon, and now I see 
that the decommissioning node is told to replicate both to two and three other 
nodes. Actually most of the requests is to replicate only two copies, so I 
suspect that the blocks it's asked to replicate does have live replicas in the 
cluster.
Appx 1/5 of the replication requests is for three nodes:
2015-02-14 23:00:16,008 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839116_24099285 to datanode(s) x.x.x.206:50010 
x.x.x.207:50010 x.x.x.209:50010
2015-02-14 23:00:16,009 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839119_24099288 to datanode(s) x.x.x.206:50010 
x.x.x.205:50010 x.x.x.209:50010
2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839113_24099282 to datanode(s) x.x.x.204:50010 x.x.x.205:50010
2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839114_24099283 to datanode(s) x.x.x.204:50010 x.x.x.209:50010
2015-02-14 23:00:16,011 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839166_24099335 to datanode(s) x.x.x.204:50010 x.x.x.207:50010
2015-02-14 23:00:16,012 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839162_24099331 to datanode(s) x.x.x.206:50010 
x.x.x.205:50010 x.x.x.209:50010
2015-02-14 23:00:20,046 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839309_24099478 to datanode(s) x.x.x.206:50010 x.x.x.205:50010
2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839310_24099479 to datanode(s) x.x.x.204:50010 x.x.x.205:50010
2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839352_24099521 to datanode(s) x.x.x.206:50010 x.x.x.209:50010
2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839359_24099528 to datanode(s) x.x.x.206:50010 x.x.x.209:50010
2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839358_24099527 to datanode(s) x.x.x.204:50010 x.x.x.209:50010
2015-02-14 23:00:20,049 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839357_24099526 to datanode(s) x.x.x.206:50010 x.x.x.207:50010
2015-02-14 23:00:22,056 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839241_24099410 to datanode(s) x.x.x.206:50010 x.x.x.205:50010
2015-02-14 23:00:22,057 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to 
replicate blk_1097839242_24099411 to datanode(s) x.x.x.204:50010 
x.x.x.209:50010 x.x.x.205:50010


The node at 208 is decommissioning and i would say that this proves that the 
node is asked to replicate blocks that have live replicas as well as blocks 
with no live replicas.  I haven't looked at the code, but it's wrong for me to 
have the decommissioning node replicate other blocks than those without live 
replicas.

 Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
 to blocks on nodes being decomissioned
 --

 Key: HDFS-7787
 URL: https://issues.apache.org/jira/browse/HDFS-7787
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
 Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
  Labels: balance, hdfs, replication-performance

 Each file has a setting of 3 replicas. split on different racks.
 After a simulated crash of one rack (shutdown of all nodes, deleted 
 data-directory an started nodes) and decommssion of one of the nodes in the 
 orther rack the replication does not follow 'normal' rules...
 My cluster has appx 25 mill files, and the one node I now try to decommision 
 has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
 replicas'. After a restart of the node, it starts to replicate both types of 
 blocks, but after a while, it only repliates under-replicated blocks with 
 other live copies. I would think that the 'normal' way to do this would be to 
 make sure that all blocks this node keeps the only copy of, should be the 
 first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)