[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-02 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.06.patch

Thanks for the review Eddy! Improved the messages based on your suggestion.

Test failures doesn't look related, let's use the new run to cross-check.

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch, 
> HDFS-12725.06.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.05.patch

I was thinking about this patch and IMO we should still WARN in NN logs even if 
it's placed, so the situation doesn't go unnoticed.

Will now emit an message like:
{noformat}
2017-11-01 10:49:27,081 [IPC Server handler 8 on 55407] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyRackFaultTolerant.java:chooseTargetInOrder(142)) - Only 
able to place 7 of total expected 9 (maxNodesPerRack=2, numOfReplicas=4) nodes 
evenly across racks, falling back to uneven placement.
{noformat}

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-31 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.04.patch

Patch 4 to fix the checkstyles that's relevant. Method with 7+ parameters is 
not my favorite, but that is consistent with other calls inside the bpp.

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch, HDFS-12725.04.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-30 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.03.patch

Thanks for the review [~jojochuang]. Patch 3 to address the comments.

Initially wanted to let the loop in the test to make sure the over-by-1 is 
always on the lower numbers of the multi-node rack (e.g. if left 3 nodes for 
rack1 and rack2, rack1 always has 2 nodes). But that's not necessary for 
testing, so modified it to be the standard way.

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-30 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Labels: hdfs-ec-3.0-must-do  (was: )

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.02.patch

Patch 2 thanks for some chats with Eddy. Fun bug. :)

Added test to ensure {{BlockPlacementPolicyRackFaultTolerant}} still attempts 
to tolerate 1-rack-failure if the setup allows - it currently doesn't do that, 
so improved the best-effort code.

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Status: Patch Available  (was: Open)

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: (was: HDFS-12725.01.patch)

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.01.patch

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch, HDFS-12725.01.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.01.patch

Patch 1 to reproduce the error and fix. [~andrew.wang] and [~eddyxu], could you 
take a look?

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch, HDFS-12725.01.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org