[jira] [Updated] (HDFS-12077) Implement a remaining space based balancer policy
[ https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuyiyang updated HDFS-12077: - Attachment: Remaining space based balancer.pdf > Implement a remaining space based balancer policy > - > > Key: HDFS-12077 > URL: https://issues.apache.org/jira/browse/HDFS-12077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: liuyiyang > Attachments: Remaining space based balancer.pdf > > > Our cluster has DataNodes with 2T disk storage, as storage utilization of the > cluster growing, we need to add new DataNodes to increse the capacity of our > cluster. In order to make utilization of every DataNode be in relatively > balanced state, usually we use HDFS balancer tool to balance our cluster > every time we add new DataNodes. > We have been facing an issue with heterogeneous disk capacity when using HDFS > balancer tool. In production cluster, we often have to add new DataNodes with > larger disk capacity than previous DNs. Since the original balancer is > implemented to balance utilization of every DataNode, the balancer will make > every DN's utilization and average utilization of the cluster be within a > given threshold. > For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks > with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer > may make the cluster balanced in the following state: > ||DataNode||Total Capacity||Used||Remaining|| utilization|| > |DN1 | 20T | 18T| 2T| 90%| > |DN2|100T |90T | 10T|90%| > each DN has reached a 90% utilization, in such a case, DN1's capacibility to > store new blocks is far less than DN2's. When DN1 is full, all of the new > blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As > a result, DN2 is overloaded and we can not > make full use of each DN's I/O capacity. In such a case, We wish the balancer > could run based on remaining space of every DN. After balancing, every DN's > remaining space could be balanced like the following state: > ||DataNode ||Total Capacity || Used ||Remaining||utilization|| > |DN1 | 20T |14T | 6T |70%| > |DN2 | 100T | 94T| 6T |94%| > In a cluster with balanced remaining space of DN's capacity, every DN will be > utilized when writing new blocks to the cluster, on the other hand, every > DN's I/O capacity can be utilized when running MR jobs. > Please let me know what you guys think. I will attach a patch if necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12077) Implement a remaining space based balancer policy
[ https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138055#comment-16138055 ] liuyiyang commented on HDFS-12077: -- Hi Tsz Wo Nicholas Sze . Thanks very much for your advice, I totally agree with you. Remaining space based balancer is not applicable when the disk usage is low. I think it's better to make remaining space based balancer be another balancer tool choice for users. In my implementation, remaining space based balancer will only run when the utilization of the cluster is greater than a given threshold. On the basis of default balancer algorithm, I design a remaining space based balancer algorithm. I attached a doc "Remaining space based balancer.pdf" to introduce remaining space based balancer algorithm. > Implement a remaining space based balancer policy > - > > Key: HDFS-12077 > URL: https://issues.apache.org/jira/browse/HDFS-12077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: liuyiyang > > Our cluster has DataNodes with 2T disk storage, as storage utilization of the > cluster growing, we need to add new DataNodes to increse the capacity of our > cluster. In order to make utilization of every DataNode be in relatively > balanced state, usually we use HDFS balancer tool to balance our cluster > every time we add new DataNodes. > We have been facing an issue with heterogeneous disk capacity when using HDFS > balancer tool. In production cluster, we often have to add new DataNodes with > larger disk capacity than previous DNs. Since the original balancer is > implemented to balance utilization of every DataNode, the balancer will make > every DN's utilization and average utilization of the cluster be within a > given threshold. > For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks > with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer > may make the cluster balanced in the following state: > ||DataNode||Total Capacity||Used||Remaining|| utilization|| > |DN1 | 20T | 18T| 2T| 90%| > |DN2|100T |90T | 10T|90%| > each DN has reached a 90% utilization, in such a case, DN1's capacibility to > store new blocks is far less than DN2's. When DN1 is full, all of the new > blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As > a result, DN2 is overloaded and we can not > make full use of each DN's I/O capacity. In such a case, We wish the balancer > could run based on remaining space of every DN. After balancing, every DN's > remaining space could be balanced like the following state: > ||DataNode ||Total Capacity || Used ||Remaining||utilization|| > |DN1 | 20T |14T | 6T |70%| > |DN2 | 100T | 94T| 6T |94%| > In a cluster with balanced remaining space of DN's capacity, every DN will be > utilized when writing new blocks to the cluster, on the other hand, every > DN's I/O capacity can be utilized when running MR jobs. > Please let me know what you guys think. I will attach a patch if necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report
[ https://issues.apache.org/jira/browse/HDFS-12119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128841#comment-16128841 ] liuyiyang commented on HDFS-12119: -- I attached a patch for this problem, can someone please review it? Thanks so much. > Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and > fsck report > -- > > Key: HDFS-12119 > URL: https://issues.apache.org/jira/browse/HDFS-12119 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: liuyiyang > Attachments: HDFS-12119.001.patch > > > Sometimes the information "Number of Under-Replicated Blocks" shown on > NameNode web UI is inconsistent with the "Under-replicated blocks" > information shown in fsck report. > It's easy to reproduce such a case as follows: > 1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot > of blocks, the replication factor is set to 2; > 2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ; > 3、Restart HDFS daemons. > Then you can find inconsistent "Number of Under-Replicated Blocks" on web ui > and "Under-replicated blocks" in fsck result. > I dug into the source code and found that "Number of Under-Replicated > Blocks" on web ui consists of blocks that have less than target number of > replicas and blocks that have the right number of replicas, but which the > block manager felt were badly distributed. In fsck result, "Under-replicated > blocks" are the blocks that have less than target number of replicas, and > "Mis-replicated blocks" are the blocks that are badly distributed. So the > Under-Replicated Blocks info on web UI and fsck result may be inconsistent. > Since Under-Replicated Blocks means higer missing risk for blocks, when > threre is no blocks that have less than target replicas but a lot of blocks > that have the right number of blocks but are badlly distributed, the "Number > of Under-Replicated Blocks" on web UI will be same as number of > "Mis-replicated blocks", which is misleading for users. > It would be clear to make "Number of Under-Replicated Blocks" on web UI be > consistent with "Under-replicated blocks" in fsck result. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report
[ https://issues.apache.org/jira/browse/HDFS-12119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuyiyang updated HDFS-12119: - Attachment: HDFS-12119.001.patch > Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and > fsck report > -- > > Key: HDFS-12119 > URL: https://issues.apache.org/jira/browse/HDFS-12119 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: liuyiyang > Attachments: HDFS-12119.001.patch > > > Sometimes the information "Number of Under-Replicated Blocks" shown on > NameNode web UI is inconsistent with the "Under-replicated blocks" > information shown in fsck report. > It's easy to reproduce such a case as follows: > 1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot > of blocks, the replication factor is set to 2; > 2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ; > 3、Restart HDFS daemons. > Then you can find inconsistent "Number of Under-Replicated Blocks" on web ui > and "Under-replicated blocks" in fsck result. > I dug into the source code and found that "Number of Under-Replicated > Blocks" on web ui consists of blocks that have less than target number of > replicas and blocks that have the right number of replicas, but which the > block manager felt were badly distributed. In fsck result, "Under-replicated > blocks" are the blocks that have less than target number of replicas, and > "Mis-replicated blocks" are the blocks that are badly distributed. So the > Under-Replicated Blocks info on web UI and fsck result may be inconsistent. > Since Under-Replicated Blocks means higer missing risk for blocks, when > threre is no blocks that have less than target replicas but a lot of blocks > that have the right number of blocks but are badlly distributed, the "Number > of Under-Replicated Blocks" on web UI will be same as number of > "Mis-replicated blocks", which is misleading for users. > It would be clear to make "Number of Under-Replicated Blocks" on web UI be > consistent with "Under-replicated blocks" in fsck result. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12128) Namenode failover may make balancer's efforts be in vain
liuyiyang created HDFS-12128: Summary: Namenode failover may make balancer's efforts be in vain Key: HDFS-12128 URL: https://issues.apache.org/jira/browse/HDFS-12128 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Affects Versions: 2.6.0 Reporter: liuyiyang The problem can be reproduced as follows: 1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to make the cluster balanced; 2.Before starting balancer, trigger failover of namenodes, this will make all datanodes be marked as stale by active namenode; 3.Start balancer to make the datanode usage balanced; 4.As balancer is running, under-utilized datanodes' usage will increase, but over-utilized datanodes' usage will stay unchanged for long time. Since all datanodes are marked as stale, deletion will be postponed in stale datanodes. During balancing, the replicas in source datanodes can't be deleted immediately, so the total usage of the cluster will increase and won't decrease until datanodes' stale state be cancelled. When the datanodes send next block report to namenode(default interval is 6h), active namenode will cancel the stale state of datanodes. I found if replicas on source datanodes can't be deleted immediately in OP_REPLACE operation via del_hint to namenode, namenode will schedule replicas on datanodes with least remaining space to delete instead of replicas on source datanodes. Unfortunately, datanodes with least remaining space may be the target datanodes when balancing, which will lead to imbalanced datanode usage again. If balancer finishes before next block report, all postponed over-replicated replicas will be deleted based on remaining space of datanodes, this may lead to furitless balancer efforts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report
liuyiyang created HDFS-12119: Summary: Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report Key: HDFS-12119 URL: https://issues.apache.org/jira/browse/HDFS-12119 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: liuyiyang Sometimes the information "Number of Under-Replicated Blocks" shown on NameNode web UI is inconsistent with the "Under-replicated blocks" information shown in fsck report. It's easy to reproduce such a case as follows: 1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot of blocks, the replication factor is set to 2; 2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ; 3、Restart HDFS daemons. Then you can find inconsistent "Number of Under-Replicated Blocks" on web ui and "Under-replicated blocks" in fsck result. I dug into the source code and found that "Number of Under-Replicated Blocks" on web ui consists of blocks that have less than target number of replicas and blocks that have the right number of replicas, but which the block manager felt were badly distributed. In fsck result, "Under-replicated blocks" are the blocks that have less than target number of replicas, and "Mis-replicated blocks" are the blocks that are badly distributed. So the Under-Replicated Blocks info on web UI and fsck result may be inconsistent. Since Under-Replicated Blocks means higer missing risk for blocks, when threre is no blocks that have less than target replicas but a lot of blocks that have the right number of blocks but are badlly distributed, the "Number of Under-Replicated Blocks" on web UI will be same as number of "Mis-replicated blocks", which is misleading for users. It would be clear to make "Number of Under-Replicated Blocks" on web UI be consistent with "Under-replicated blocks" in fsck result. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12077) Implement a remaining space based balancer policy
[ https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuyiyang updated HDFS-12077: - Description: Our cluster has DataNodes with 2T disk storage, as storage utilization of the cluster growing, we need to add new DataNodes to increse the capacity of our cluster. In order to make utilization of every DataNode be in relatively balanced state, usually we use HDFS balancer tool to balance our cluster every time we add new DataNodes. We have been facing an issue with heterogeneous disk capacity when using HDFS balancer tool. In production cluster, we often have to add new DataNodes with larger disk capacity than previous DNs. Since the original balancer is implemented to balance utilization of every DataNode, the balancer will make every DN's utilization and average utilization of the cluster be within a given threshold. For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer may make the cluster balanced in the following state: ||DataNode||Total Capacity||Used||Remaining|| utilization|| |DN1 | 20T | 18T| 2T| 90%| |DN2|100T |90T | 10T|90%| each DN has reached a 90% utilization, in such a case, DN1's capacibility to store new blocks is far less than DN2's. When DN1 is full, all of the new blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a result, DN2 is overloaded and we can not make full use of each DN's I/O capacity. In such a case, We wish the balancer could run based on remaining space of every DN. After balancing, every DN's remaining space could be balanced like the following state: ||DataNode ||Total Capacity || Used ||Remaining||utilization|| |DN1 | 20T |14T | 6T |70%| |DN2 | 100T | 94T| 6T |94%| In a cluster with balanced remaining space of DN's capacity, every DN will be utilized when writing new blocks to the cluster, on the other hand, every DN's I/O capacity can be utilized when running MR jobs. Please let me know what you guys think. I will attach a patch if necessary. was: Our cluster has DataNodes with 2T disk storage, as storage utilization of the cluster growing, we need to add new DataNodes to increse the capacity of our cluster. In order to make utilization of every DataNode be in relatively balanced state, usually we use HDFS balancer tool to balance our cluster every time we add new DataNodes. We have been facing an issue with heterogeneous disk capacity when using HDFS balancer tool. In production cluster, we often have to add new DataNodes with larger disk capacity than previous DNs. Since the original balancer is implemented to balance utilization of every DataNode, the balancer will make every DN's utilization and average utilization of the cluster be within a given threshold. For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer may make the cluster balanced in the following state: ||DataNode||Total Capacity||Used||Remaining|| utilization|| |DN1 | 20T | 18T| 2T| 90%| |DN2|100T |90T | 10T|90%| each DN has reached an 90% utilization, in such a case, DN1's capacibility to store new blocks is far less than DN2's. When DN1 is full, all of the new blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a result, DN2 is overloaded and we can not make full use of each DN's I/O capacity. In such a case, We wish the balancer could run based on remaining space of every DN. After balancing, every DN's remaining space could be balanced like the following state: ||DataNode ||Total Capacity || Used ||Remaining||utilization|| |DN1 | 20T |14T | 6T |70%| |DN2 | 100T | 94T| 6T |94%| In a cluster with balanced remaining space of DN's capacity, every DN will be utilized when writing new blocks to the cluster, on the other hand, every DN's I/O capacity can be utilized when running MR jobs. Please let me know what you guys think. I will attach a patch if necessary. > Implement a remaining space based balancer policy > - > > Key: HDFS-12077 > URL: https://issues.apache.org/jira/browse/HDFS-12077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: liuyiyang > > Our cluster has DataNodes with 2T disk storage, as storage utilization of the > cluster growing, we need to add new DataNodes to increse the capacity of our > cluster. In order to make utilization of every DataNode be in relatively > balanced state, usually we use HDFS balancer tool
[jira] [Created] (HDFS-12077) Implement a remaining space based balancer policy
liuyiyang created HDFS-12077: Summary: Implement a remaining space based balancer policy Key: HDFS-12077 URL: https://issues.apache.org/jira/browse/HDFS-12077 Project: Hadoop HDFS Issue Type: New Feature Components: balancer & mover Affects Versions: 2.6.0 Reporter: liuyiyang Our cluster has DataNodes with 2T disk storage, as storage utilization of the cluster growing, we need to add new DataNodes to increse the capacity of our cluster. In order to make utilization of every DataNode be in relatively balanced state, usually we use HDFS balancer tool to balance our cluster every time we add new DataNodes. We have been facing an issue with heterogeneous disk capacity when using HDFS balancer tool. In production cluster, we often have to add new DataNodes with larger disk capacity than previous DNs. Since the original balancer is implemented to balance utilization of every DataNode, the balancer will make every DN's utilization and average utilization of the cluster be within a given threshold. For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer may make the cluster balanced in the following state: ||DataNode||Total Capacity||Used||Remaining|| utilization|| |DN1 | 20T | 18T| 2T| 90%| |DN2|100T |90T | 10T|90%| each DN has reached an 90% utilization, in such a case, DN1's capacibility to store new blocks is far less than DN2's. When DN1 is full, all of the new blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a result, DN2 is overloaded and we can not make full use of each DN's I/O capacity. In such a case, We wish the balancer could run based on remaining space of every DN. After balancing, every DN's remaining space could be balanced like the following state: ||DataNode ||Total Capacity || Used ||Remaining||utilization|| |DN1 | 20T |14T | 6T |70%| |DN2 | 100T | 94T| 6T |94%| In a cluster with balanced remaining space of DN's capacity, every DN will be utilized when writing new blocks to the cluster, on the other hand, every DN's I/O capacity can be utilized when running MR jobs. Please let me know what you guys think. I will attach a patch if necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org