[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-16 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.006.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch, 
> HDFS-13279.006.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-04 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.005.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.004.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: (was: HDFS-13279.004.patch)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.004.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-22 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.003.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.002.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Status: Patch Available  (was: Open)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.3
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.001.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Description: 
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
 With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
 Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.
||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|

  was:
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.


||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|


> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Description: 
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.


||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
> With the default blockplacement policy, for each block, the first replication 
> has the same probability to write to each datanode, but the probability for 
> the 2nd/3rd replication to write to the last 5 nodes would much higher than 
> to other nodes. 
> Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Affects Version/s: 2.8.3
   3.0.0

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
> With the default blockplacement policy, for each block, the first replication 
> has the same probability to write to each datanode, but the probability for 
> the 2nd/3rd replication to write to the last 5 nodes would much higher than 
> to other nodes. 
> Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Summary: Datanodes usage is imbalanced if number of nodes per rack is not 
equal  (was: Datanodes usage is imbalanced if node)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Jie
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org