[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802579#comment-17802579
 ] 

Shilun Fan commented on HDFS-17061:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-29 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738581#comment-17738581
 ] 

Stephen O'Donnell commented on HDFS-17061:
--

I don't know of any tool, and certainly the datanode does not know if the block 
is a data or parity.

You might be able to do some analysis from fsck output, but I have never tried 
to do it for this EC analysis.

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-29 Thread WangYuanben (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738576#comment-17738576
 ] 

WangYuanben commented on HDFS-17061:


[~sodonnell] Thank you for the comment. I need some examples to validate this 
idea, but it seems there is currently no direct way to obtain the number of 
data blocks and parity blocks. Therefore, it is necessary to develop a 
functionality to retrieve the number of data blocks and parity blocks first and 
do some tests in the subtask. I will create it later.

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737802#comment-17737802
 ] 

Stephen O'Donnell commented on HDFS-17061:
--

On a large cluster with many datanodes, and approximately random datanode 
selection for the pipelines, would the cluster balance out naturally?

Have you seen a problem like this on a large cluster, where the parity and data 
blocks are not reasonably well balanced?

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org