[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.06.patch Thanks for the review Eddy! Improved the messages based on your suggestion. Test failures doesn't look related, let's use the new run to cross-check. > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, > HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch, > HDFS-12725.06.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.05.patch I was thinking about this patch and IMO we should still WARN in NN logs even if it's placed, so the situation doesn't go unnoticed. Will now emit an message like: {noformat} 2017-11-01 10:49:27,081 [IPC Server handler 8 on 55407] WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyRackFaultTolerant.java:chooseTargetInOrder(142)) - Only able to place 7 of total expected 9 (maxNodesPerRack=2, numOfReplicas=4) nodes evenly across racks, falling back to uneven placement. {noformat} > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, > HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.04.patch Patch 4 to fix the checkstyles that's relevant. Method with 7+ parameters is not my favorite, but that is consistent with other calls inside the bpp. > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, > HDFS-12725.03.patch, HDFS-12725.04.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.03.patch Thanks for the review [~jojochuang]. Patch 3 to address the comments. Initially wanted to let the loop in the test to make sure the over-by-1 is always on the lower numbers of the multi-node rack (e.g. if left 3 nodes for rack1 and rack2, rack1 always has 2 nodes). But that's not necessary for testing, so modified it to be the standard way. > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, > HDFS-12725.03.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Labels: hdfs-ec-3.0-must-do (was: ) > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.02.patch Patch 2 thanks for some chats with Eddy. Fun bug. :) Added test to ensure {{BlockPlacementPolicyRackFaultTolerant}} still attempts to tolerate 1-rack-failure if the setup allows - it currently doesn't do that, so improved the best-effort code. > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Status: Patch Available (was: Open) > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-12725.01.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: (was: HDFS-12725.01.patch) > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-12725.01.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.01.patch > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-12725.01.patch, HDFS-12725.01.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.01.patch Patch 1 to reproduce the error and fix. [~andrew.wang] and [~eddyxu], could you take a look? > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-12725.01.patch, HDFS-12725.01.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org