[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2018-05-18 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Target Version/s:   (was: 2.10.0)

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Priority: Major
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2017-10-03 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Target Version/s: 2.10.0  (was: 2.9.0)

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2017-09-29 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HDFS-8344:
--

Is this still on target for 2.9.0 ? If not, can we we push this out to the next 
major release ?

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-8344:
-
Target Version/s: 2.9.0  (was: 2.8.0)

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2016-02-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8344:
--
Fix Version/s: (was: 2.8.0)

Removing fix-version given the patch was reverted and waiting final-commit.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2016-01-04 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.10.patch

Here's a patch for trunk

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2016-01-04 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Patch Available  (was: Open)

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-10-18 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: TestHadoop.java

The TestHadoop.java file I referenced in the description.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, 
> HDFS-8344.09.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-10-03 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Open  (was: Patch Available)

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.09.patch

Here's a patch with timeout and number of retries

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-22 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.08.patch

Hi Masatake! An on/off switch would require reading from {{Configuration}} too 
(which Haohui is objecting to). Thanks for your suggestion to add an additional 
warning. I've done so in this latest patch. Could you please review it?

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-22 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Patch Available  (was: Reopened)

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-20 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
 Release Note: Allow a configuration to specify the maximum number of 
recovery attempts for blocks under construction.
   Status: Resolved  (was: Patch Available)

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-20 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.07.patch

Thanks a lot for the careful review Allen! Here's another with the fixes.

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-16 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.06.patch

Here's a patch where the number of maximum attempts is configurable. 
[~iwasakims] Could you please review it?
Please also note that I wasn't able to find another instance where Block or its 
subclasses used something from Configuration. To avoid creating a Configuration 
object on the creation of every block I am getting that value from a static 
field in BlockManager.

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-06 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.05.patch

Here's an upmerged patch that applies cleanly on trunk. Could someone please 
review it?

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-21 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.04.patch

Fixing checkstyle -1. The test failure is not because of this patch. I wasn't 
able to make the test fail. Using this simple change to kick the commit build 
again

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-20 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Patch Available  (was: Open)

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-20 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.03.patch

Hi Kihwal! Here's a patch which improves the test to ensure that data that was 
once hsynced is read back correctly. Could you please review it?

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-15 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Open  (was: Patch Available)

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-11 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.02.patch

Thanks for your review [~kihwal] . I'm sorry [that 
change|https://issues.apache.org/jira/browse/HDFS-7342?focusedCommentId=14520401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14520401]
 accidentally leaked into the patch. I was using that change to test easily. 
Here's another patch without that change and some more documentation.

bq. Finalizing blocks without checking the actual replica state will very 
likely cause corruption later. 
That is correct. However its unavoidable. e.g. If the client was writing with 
replication=1, and it dies, the lease would not be recovered for (by default) 
another 1 hour. If during this one hour, the datanode also failed, then we have 
genuinely lost the data (that we acked to the client before it crashed) and the 
file should be marked corrupt. I fail to see any point in keeping the lease 
open forever hoping that at some point we will be able to bring the block back. 
I'm completely open on how long we want to wait as long as its not forever. 
With this patch we wait until we have tried to set each possible replica to be 
the primary for 5 times. Would you prefer some other timeout? 


 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-07 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.01.patch

Here's a patch with a unit test and a fix. Could people please review it?

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-05-07 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Status: Patch Available  (was: Open)

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)