[jira] [Updated] (HDFS-12077) Implement a remaining space based balancer policy

2017-08-23 Thread liuyiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuyiyang updated HDFS-12077:
-
Attachment: Remaining space based balancer.pdf

> Implement a remaining space based balancer policy
> -
>
> Key: HDFS-12077
> URL: https://issues.apache.org/jira/browse/HDFS-12077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: liuyiyang
> Attachments: Remaining space based balancer.pdf
>
>
> Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
> cluster growing, we need to add new DataNodes to increse the capacity of our 
> cluster. In order to make utilization of every DataNode be in relatively 
> balanced state, usually we use HDFS balancer tool to balance our cluster 
> every time we add new DataNodes.
> We have been facing an issue with heterogeneous disk capacity when using HDFS 
> balancer tool. In production cluster, we often have to add new DataNodes with 
> larger disk capacity than previous DNs. Since the original balancer is 
> implemented to balance utilization of every DataNode, the balancer will make 
> every DN's utilization and average utilization of the cluster be within a 
> given threshold.
> For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks 
> with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer 
> may make the cluster balanced in the following state:
> ||DataNode||Total Capacity||Used||Remaining|| utilization||
> |DN1   | 20T  |  18T| 2T| 90%|
> |DN2|100T   |90T   |  10T|90%|
> each DN has reached a 90% utilization, in such a case, DN1's capacibility to 
> store new blocks is far less than DN2's. When DN1 is full, all of the new 
> blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As 
> a result, DN2 is overloaded and we can not 
> make full use of each DN's I/O capacity. In such a case, We wish the balancer 
> could run based on remaining space of every DN. After balancing, every DN's 
> remaining space could be balanced like the following state:
> ||DataNode  ||Total Capacity || Used  ||Remaining||utilization||
>  |DN1   |  20T |14T | 6T |70%|
>  |DN2   |  100T |   94T|  6T |94%|
> In a cluster with balanced remaining space of DN's capacity, every DN will be 
> utilized when writing new blocks to the cluster,  on the other hand,  every 
> DN's I/O capacity can be utilized when running MR jobs.
> Please let me know what you guys think.  I will attach a patch if necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12077) Implement a remaining space based balancer policy

2017-08-23 Thread liuyiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138055#comment-16138055
 ] 

liuyiyang commented on HDFS-12077:
--

Hi Tsz Wo Nicholas Sze .
Thanks very much for your advice, I totally agree with you. Remaining space 
based balancer is not applicable when the disk usage is low.
I think it's better to make remaining space based balancer be another balancer 
tool choice for users.
In my implementation, remaining space based balancer will only run when the 
utilization of the cluster is greater than a given threshold. On the basis of 
default balancer algorithm, I design a remaining space based balancer 
algorithm. I attached a doc "Remaining space based balancer.pdf" to introduce 
remaining space based balancer algorithm. 

> Implement a remaining space based balancer policy
> -
>
> Key: HDFS-12077
> URL: https://issues.apache.org/jira/browse/HDFS-12077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: liuyiyang
>
> Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
> cluster growing, we need to add new DataNodes to increse the capacity of our 
> cluster. In order to make utilization of every DataNode be in relatively 
> balanced state, usually we use HDFS balancer tool to balance our cluster 
> every time we add new DataNodes.
> We have been facing an issue with heterogeneous disk capacity when using HDFS 
> balancer tool. In production cluster, we often have to add new DataNodes with 
> larger disk capacity than previous DNs. Since the original balancer is 
> implemented to balance utilization of every DataNode, the balancer will make 
> every DN's utilization and average utilization of the cluster be within a 
> given threshold.
> For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks 
> with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer 
> may make the cluster balanced in the following state:
> ||DataNode||Total Capacity||Used||Remaining|| utilization||
> |DN1   | 20T  |  18T| 2T| 90%|
> |DN2|100T   |90T   |  10T|90%|
> each DN has reached a 90% utilization, in such a case, DN1's capacibility to 
> store new blocks is far less than DN2's. When DN1 is full, all of the new 
> blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As 
> a result, DN2 is overloaded and we can not 
> make full use of each DN's I/O capacity. In such a case, We wish the balancer 
> could run based on remaining space of every DN. After balancing, every DN's 
> remaining space could be balanced like the following state:
> ||DataNode  ||Total Capacity || Used  ||Remaining||utilization||
>  |DN1   |  20T |14T | 6T |70%|
>  |DN2   |  100T |   94T|  6T |94%|
> In a cluster with balanced remaining space of DN's capacity, every DN will be 
> utilized when writing new blocks to the cluster,  on the other hand,  every 
> DN's I/O capacity can be utilized when running MR jobs.
> Please let me know what you guys think.  I will attach a patch if necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report

2017-08-16 Thread liuyiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128841#comment-16128841
 ] 

liuyiyang commented on HDFS-12119:
--

I attached a patch for this problem, can someone please review it? Thanks so 
much.

> Inconsistent "Number of Under-Replicated Blocks"  shown on HDFS web UI and 
> fsck report
> --
>
> Key: HDFS-12119
> URL: https://issues.apache.org/jira/browse/HDFS-12119
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: liuyiyang
> Attachments: HDFS-12119.001.patch
>
>
> Sometimes the information "Number of Under-Replicated Blocks" shown on 
> NameNode web UI is inconsistent with the "Under-replicated blocks" 
> information shown in fsck report.
> It's easy to reproduce such a case as follows:
> 1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot 
> of blocks, the replication factor is set to 2;
> 2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ;
> 3、Restart HDFS daemons.
> Then you can find inconsistent  "Number of Under-Replicated Blocks" on web ui 
> and "Under-replicated blocks" in fsck result.
> I dug into the source code and found that  "Number of Under-Replicated 
> Blocks" on web ui consists of blocks that have less than target number of 
> replicas and blocks that have the right number of replicas, but which the 
> block manager felt were badly distributed.  In fsck result, "Under-replicated 
> blocks" are the blocks that have less than target number of replicas, and 
> "Mis-replicated blocks" are the blocks that are badly distributed.  So the 
> Under-Replicated Blocks info on web UI and fsck result may be inconsistent.
> Since Under-Replicated Blocks means higer missing risk for blocks, when 
> threre is  no blocks that have less than target replicas but a lot of blocks 
> that have the right number of blocks but are badlly distributed, the "Number 
> of Under-Replicated Blocks" on web UI will be same as number of 
> "Mis-replicated blocks", which is misleading for users. 
> It would be clear to make "Number of Under-Replicated Blocks" on web UI be 
> consistent with "Under-replicated blocks" in fsck result. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report

2017-08-16 Thread liuyiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuyiyang updated HDFS-12119:
-
Attachment: HDFS-12119.001.patch

> Inconsistent "Number of Under-Replicated Blocks"  shown on HDFS web UI and 
> fsck report
> --
>
> Key: HDFS-12119
> URL: https://issues.apache.org/jira/browse/HDFS-12119
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: liuyiyang
> Attachments: HDFS-12119.001.patch
>
>
> Sometimes the information "Number of Under-Replicated Blocks" shown on 
> NameNode web UI is inconsistent with the "Under-replicated blocks" 
> information shown in fsck report.
> It's easy to reproduce such a case as follows:
> 1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot 
> of blocks, the replication factor is set to 2;
> 2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ;
> 3、Restart HDFS daemons.
> Then you can find inconsistent  "Number of Under-Replicated Blocks" on web ui 
> and "Under-replicated blocks" in fsck result.
> I dug into the source code and found that  "Number of Under-Replicated 
> Blocks" on web ui consists of blocks that have less than target number of 
> replicas and blocks that have the right number of replicas, but which the 
> block manager felt were badly distributed.  In fsck result, "Under-replicated 
> blocks" are the blocks that have less than target number of replicas, and 
> "Mis-replicated blocks" are the blocks that are badly distributed.  So the 
> Under-Replicated Blocks info on web UI and fsck result may be inconsistent.
> Since Under-Replicated Blocks means higer missing risk for blocks, when 
> threre is  no blocks that have less than target replicas but a lot of blocks 
> that have the right number of blocks but are badlly distributed, the "Number 
> of Under-Replicated Blocks" on web UI will be same as number of 
> "Mis-replicated blocks", which is misleading for users. 
> It would be clear to make "Number of Under-Replicated Blocks" on web UI be 
> consistent with "Under-replicated blocks" in fsck result. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12128) Namenode failover may make balancer's efforts be in vain

2017-07-12 Thread liuyiyang (JIRA)
liuyiyang created HDFS-12128:


 Summary: Namenode failover may make balancer's efforts be in vain
 Key: HDFS-12128
 URL: https://issues.apache.org/jira/browse/HDFS-12128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Affects Versions: 2.6.0
Reporter: liuyiyang


The problem can be reproduced as follows:
1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to 
make the cluster balanced;
2.Before starting balancer, trigger failover of namenodes, this will make all 
datanodes be marked as stale by active namenode;
3.Start balancer to make the datanode usage balanced;
4.As balancer is running, under-utilized datanodes' usage will increase, but 
over-utilized datanodes' usage will stay unchanged for long time.

Since all datanodes are marked as stale, deletion will be postponed in stale 
datanodes. During balancing, the replicas in source datanodes can't be deleted 
immediately,
so the total usage of the cluster will increase and won't decrease until 
datanodes' stale state be cancelled.
When the datanodes send next block report to namenode(default interval is 6h), 
active namenode will cancel the stale state of datanodes. I found if replicas 
on source datanodes can't be deleted immediately in OP_REPLACE operation via 
del_hint to namenode,
namenode will schedule replicas on datanodes with least remaining space to 
delete instead of replicas on source datanodes. Unfortunately, datanodes with 
least remaining space may be the target datanodes when balancing, which will 
lead to imbalanced datanode usage again.
If balancer finishes before next block report, all postponed over-replicated 
replicas will be deleted based on remaining space of datanodes, this may lead 
to furitless balancer efforts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12119) Inconsistent "Number of Under-Replicated Blocks" shown on HDFS web UI and fsck report

2017-07-11 Thread liuyiyang (JIRA)
liuyiyang created HDFS-12119:


 Summary: Inconsistent "Number of Under-Replicated Blocks"  shown 
on HDFS web UI and fsck report
 Key: HDFS-12119
 URL: https://issues.apache.org/jira/browse/HDFS-12119
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: liuyiyang


Sometimes the information "Number of Under-Replicated Blocks" shown on NameNode 
web UI is inconsistent with the "Under-replicated blocks" information shown in 
fsck report.

It's easy to reproduce such a case as follows:
1、In a cluster with DN1(rack0)、DN2(rack1) and DN3(rack2) which stores a lot of 
blocks, the replication factor is set to 2;
2、Re-allocate racks as DN1(rack0)、DN2(rack1) and DN3(rack1) ;
3、Restart HDFS daemons.
Then you can find inconsistent  "Number of Under-Replicated Blocks" on web ui 
and "Under-replicated blocks" in fsck result.

I dug into the source code and found that  "Number of Under-Replicated Blocks" 
on web ui consists of blocks that have less than target number of replicas and 
blocks that have the right number of replicas, but which the block manager felt 
were badly distributed.  In fsck result, "Under-replicated blocks" are the 
blocks that have less than target number of replicas, and "Mis-replicated 
blocks" are the blocks that are badly distributed.  So the Under-Replicated 
Blocks info on web UI and fsck result may be inconsistent.

Since Under-Replicated Blocks means higer missing risk for blocks, when threre 
is  no blocks that have less than target replicas but a lot of blocks that have 
the right number of blocks but are badlly distributed, the "Number of 
Under-Replicated Blocks" on web UI will be same as number of "Mis-replicated 
blocks", which is misleading for users. 
It would be clear to make "Number of Under-Replicated Blocks" on web UI be 
consistent with "Under-replicated blocks" in fsck result. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12077) Implement a remaining space based balancer policy

2017-07-02 Thread liuyiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuyiyang updated HDFS-12077:
-
Description: 
Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
cluster growing, we need to add new DataNodes to increse the capacity of our 
cluster. In order to make utilization of every DataNode be in relatively 
balanced state, usually we use HDFS balancer tool to balance our cluster every 
time we add new DataNodes.
We have been facing an issue with heterogeneous disk capacity when using HDFS 
balancer tool. In production cluster, we often have to add new DataNodes with 
larger disk capacity than previous DNs. Since the original balancer is 
implemented to balance utilization of every DataNode, the balancer will make 
every DN's utilization and average utilization of the cluster be within a given 
threshold.
For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks 
with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer 
may make the cluster balanced in the following state:
||DataNode||Total Capacity||Used||Remaining|| utilization||
|DN1   | 20T  |  18T| 2T| 90%|
|DN2|100T   |90T   |  10T|90%|
each DN has reached a 90% utilization, in such a case, DN1's capacibility to 
store new blocks is far less than DN2's. When DN1 is full, all of the new 
blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a 
result, DN2 is overloaded and we can not 
make full use of each DN's I/O capacity. In such a case, We wish the balancer 
could run based on remaining space of every DN. After balancing, every DN's 
remaining space could be balanced like the following state:
||DataNode  ||Total Capacity || Used  ||Remaining||utilization||
 |DN1   |  20T |14T | 6T |70%|
 |DN2   |  100T |   94T|  6T |94%|
In a cluster with balanced remaining space of DN's capacity, every DN will be 
utilized when writing new blocks to the cluster,  on the other hand,  every 
DN's I/O capacity can be utilized when running MR jobs.



Please let me know what you guys think.  I will attach a patch if necessary.



  was:
Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
cluster growing, we need to add new DataNodes to increse the capacity of our 
cluster. In order to make utilization of every DataNode be in relatively 
balanced state, usually we use HDFS balancer tool to balance our cluster every 
time we add new DataNodes.
We have been facing an issue with heterogeneous disk capacity when using HDFS 
balancer tool. In production cluster, we often have to add new DataNodes with 
larger disk capacity than previous DNs. Since the original balancer is 
implemented to balance utilization of every DataNode, the balancer will make 
every DN's utilization and average utilization of the cluster be within a given 
threshold.
For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks 
with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer 
may make the cluster balanced in the following state:
||DataNode||Total Capacity||Used||Remaining|| utilization||
|DN1   | 20T  |  18T| 2T| 90%|
|DN2|100T   |90T   |  10T|90%|
each DN has reached an 90% utilization, in such a case, DN1's capacibility to 
store new blocks is far less than DN2's. When DN1 is full, all of the new 
blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a 
result, DN2 is overloaded and we can not 
make full use of each DN's I/O capacity. In such a case, We wish the balancer 
could run based on remaining space of every DN. After balancing, every DN's 
remaining space could be balanced like the following state:
||DataNode  ||Total Capacity || Used  ||Remaining||utilization||
 |DN1   |  20T |14T | 6T |70%|
 |DN2   |  100T |   94T|  6T |94%|
In a cluster with balanced remaining space of DN's capacity, every DN will be 
utilized when writing new blocks to the cluster,  on the other hand,  every 
DN's I/O capacity can be utilized when running MR jobs.



Please let me know what you guys think.  I will attach a patch if necessary.




> Implement a remaining space based balancer policy
> -
>
> Key: HDFS-12077
> URL: https://issues.apache.org/jira/browse/HDFS-12077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: liuyiyang
>
> Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
> cluster growing, we need to add new DataNodes to increse the capacity of our 
> cluster. In order to make utilization of every DataNode be in relatively 
> balanced state, usually we use HDFS balancer tool 

[jira] [Created] (HDFS-12077) Implement a remaining space based balancer policy

2017-07-02 Thread liuyiyang (JIRA)
liuyiyang created HDFS-12077:


 Summary: Implement a remaining space based balancer policy
 Key: HDFS-12077
 URL: https://issues.apache.org/jira/browse/HDFS-12077
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: balancer & mover
Affects Versions: 2.6.0
Reporter: liuyiyang


Our cluster has DataNodes with 2T disk storage, as storage utilization of the 
cluster growing, we need to add new DataNodes to increse the capacity of our 
cluster. In order to make utilization of every DataNode be in relatively 
balanced state, usually we use HDFS balancer tool to balance our cluster every 
time we add new DataNodes.
We have been facing an issue with heterogeneous disk capacity when using HDFS 
balancer tool. In production cluster, we often have to add new DataNodes with 
larger disk capacity than previous DNs. Since the original balancer is 
implemented to balance utilization of every DataNode, the balancer will make 
every DN's utilization and average utilization of the cluster be within a given 
threshold.
For example, in a cluster with two DataNodes DN1 and DN2, DN1 has ten disks 
with 2T capacity, DN2 has ten disks with 10T capacity, the original balancer 
may make the cluster balanced in the following state:
||DataNode||Total Capacity||Used||Remaining|| utilization||
|DN1   | 20T  |  18T| 2T| 90%|
|DN2|100T   |90T   |  10T|90%|
each DN has reached an 90% utilization, in such a case, DN1's capacibility to 
store new blocks is far less than DN2's. When DN1 is full, all of the new 
blocks will be written to DN2 and more MR tasks will be scheduled to DN2. As a 
result, DN2 is overloaded and we can not 
make full use of each DN's I/O capacity. In such a case, We wish the balancer 
could run based on remaining space of every DN. After balancing, every DN's 
remaining space could be balanced like the following state:
||DataNode  ||Total Capacity || Used  ||Remaining||utilization||
 |DN1   |  20T |14T | 6T |70%|
 |DN2   |  100T |   94T|  6T |94%|
In a cluster with balanced remaining space of DN's capacity, every DN will be 
utilized when writing new blocks to the cluster,  on the other hand,  every 
DN's I/O capacity can be utilized when running MR jobs.



Please let me know what you guys think.  I will attach a patch if necessary.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org