[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-11576: --- Fix Version/s: 3.0.0 > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0, 2.9.1 > > Attachments: HDFS-11576-branch-2.00.patch, > HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.9.1 Status: Resolved (was: Patch Available) Backported to branch-2.9 > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 2.9.1 > > Attachments: HDFS-11576-branch-2.00.patch, > HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Attachment: HDFS-11576-branch-2.01.patch Fixed some checkstyle warnings from test-patch in my environment. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576-branch-2.00.patch, > HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Status: Patch Available (was: Reopened) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576-branch-2.00.patch, HDFS-11576.001.patch, > HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, > HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, > HDFS-11576.008.patch, HDFS-11576.009.patch, HDFS-11576.010.patch, > HDFS-11576.011.patch, HDFS-11576.012.patch, HDFS-11576.013.patch, > HDFS-11576.014.patch, HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Attachment: HDFS-11576-branch-2.00.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576-branch-2.00.patch, HDFS-11576.001.patch, > HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, > HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, > HDFS-11576.008.patch, HDFS-11576.009.patch, HDFS-11576.010.patch, > HDFS-11576.011.patch, HDFS-11576.012.patch, HDFS-11576.013.patch, > HDFS-11576.014.patch, HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) The failing tests are also broken on trunk. I committed this through 3.0.0. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Status: Patch Available (was: Reopened) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.015.patch Patch 015 for fixing TestPipelinesFailover > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.015.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Fix Version/s: (was: 3.0.0) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-11576: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 I committed this. Thanks, Lukas. There are some trivial conflicts backporting this to branch-2. Will post a patch before continuing the backport > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.014.patch Patch 014 to fix checkstyle > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.013.patch Fix find-bugs > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.012.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.011.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.010.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.009.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: In Progress) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-11576: --- Status: In Progress (was: Patch Available) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: Open) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.008.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: (was: HDFS-11576.008) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.008 > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11576: --- Labels: (was: release-blocker) Removing from release blockers. Wish we could fix it, but it should be fine to live with 3 sec deadline for block recovery, as we currently do. The deadline means that replication is restated after 3 sec, and eventually will be completed. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-11576: --- Status: Open (was: Patch Available) Canceling patch. [~lukmajercak], can you address [~shv]'s review comments above? > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Labels: release-blocker > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11576: --- Labels: release-blocker (was: ) Target Version/s: 2.7.4 > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Labels: release-blocker > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.007.patch Fixed logging in UnderRecoveryBlocks > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.006.patch Changed BlockRecoveryAttempt.isInProgress variable name to BlockRecoveryAttempt.started > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.005.patch Added a flag to track whether recovery command has been issued to a datanode + changed the logs around. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.004.patch Thanks for the review [~elgoiri]. Attached a new patch, fixing TestAppendSnapshotTruncate and some codestyle issues. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: Open) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Open (was: Patch Available) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.003.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: Open) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.002.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Open (was: Patch Available) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.001.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Open (was: Patch Available) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: Open) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Status: Patch Available (was: Open) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: (was: HDFS-11576.repro.patch) > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.repro.patch > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-11576: -- Attachment: HDFS-11576.repro.patch Patch for reproducing the issue > Block recovery will fail indefinitely if recovery time > heartbeat interval > --- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode >Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org