subject:"\[jira\] \[Updated\] \(HDFS\-11576\) Block recovery will fail indefinitely if recovery time

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-07 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-11576:
---
Fix Version/s: 3.0.0

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HDFS-11576-branch-2.00.patch, 
> HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-07 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
   Resolution: Fixed
Fix Version/s: (was: 3.0.0)
   2.9.1
   Status: Resolved  (was: Patch Available)

Backported to branch-2.9

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 2.9.1
>
> Attachments: HDFS-11576-branch-2.00.patch, 
> HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-03 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
Attachment: HDFS-11576-branch-2.01.patch

Fixed some checkstyle warnings from test-patch in my environment.

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HDFS-11576-branch-2.00.patch, 
> HDFS-11576-branch-2.01.patch, HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-02 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
Status: Patch Available  (was: Reopened)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HDFS-11576-branch-2.00.patch, HDFS-11576.001.patch, 
> HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, 
> HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, 
> HDFS-11576.008.patch, HDFS-11576.009.patch, HDFS-11576.010.patch, 
> HDFS-11576.011.patch, HDFS-11576.012.patch, HDFS-11576.013.patch, 
> HDFS-11576.014.patch, HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-02 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
Attachment: HDFS-11576-branch-2.00.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HDFS-11576-branch-2.00.patch, HDFS-11576.001.patch, 
> HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, 
> HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, 
> HDFS-11576.008.patch, HDFS-11576.009.patch, HDFS-11576.010.patch, 
> HDFS-11576.011.patch, HDFS-11576.012.patch, HDFS-11576.013.patch, 
> HDFS-11576.014.patch, HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-02 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

The failing tests are also broken on trunk.

I committed this through 3.0.0.

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-01 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
Status: Patch Available  (was: Reopened)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-01 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.015.patch

Patch 015 for fixing TestPipelinesFailover

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.015.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-01 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
Fix Version/s: (was: 3.0.0)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-12-01 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-11576:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks, Lukas.

There are some trivial conflicts backporting this to branch-2. Will post a 
patch before continuing the backport

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-11-30 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.014.patch

Patch 014 to fix checkstyle

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-11-30 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.013.patch

Fix find-bugs

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-11-29 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.012.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.012.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-09-12 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.011.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-11 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.010.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-10 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.009.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.009.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-10 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: In Progress)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-11576:
---
Status: In Progress  (was: Patch Available)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-02 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: Open)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-01 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.008.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-01 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: (was: HDFS-11576.008)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-08-01 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.008

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-07-27 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-11576:
---
Labels:   (was: release-blocker)

Removing from release blockers. Wish we could fix it, but it should be fine to 
live with 3 sec deadline for block recovery, as we currently do. The deadline 
means that replication is restated after 3 sec, and eventually will be 
completed.

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-07-20 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-11576:
---
Status: Open  (was: Patch Available)

Canceling patch. [~lukmajercak], can you address [~shv]'s review comments above?

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
>  Labels: release-blocker
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-05-04 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-11576:
---
  Labels: release-blocker  (was: )
Target Version/s: 2.7.4

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
>  Labels: release-blocker
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-14 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.007.patch

Fixed logging in UnderRecoveryBlocks

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-14 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.006.patch

Changed BlockRecoveryAttempt.isInProgress variable name to 
BlockRecoveryAttempt.started

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.006.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-14 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.005.patch

Added a flag to track whether recovery command has been issued to a datanode + 
changed the logs around.

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-05 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.004.patch

Thanks for the review [~elgoiri]. Attached a new patch, fixing 
TestAppendSnapshotTruncate and some codestyle issues.

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-05 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: Open)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-05 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Open  (was: Patch Available)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-05 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.003.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.003.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-04 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: Open)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-04 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.002.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-04-04 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Open  (was: Patch Available)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, 
> HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-27 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.001.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-27 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Open  (was: Patch Available)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-27 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: Open)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-24 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Status: Patch Available  (was: Open)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.7.3, 2.7.2, 2.7.1
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-24 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: (was: HDFS-11576.repro.patch)

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-24 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.repro.patch

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval

2017-03-24 Thread Lukas Majercak (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-11576:
--
Attachment: HDFS-11576.repro.patch

Patch for reproducing the issue

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs, namenode
>Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is 
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to 
> NN, which fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

42 matches

Mail list logo