[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119764#comment-14119764
 ] 

Hudson commented on HDFS-4257:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #669 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/669/])
HDFS-4257. The ReplaceDatanodeOnFailure policies could have a forgiving option. 
 Contributed by szetszwo. (cmccabe: rev 
727331becc3902cb4e60ee04741e79703238e782)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/ReplaceDatanodeOnFailure.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119876#comment-14119876
 ] 

Hudson commented on HDFS-4257:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1860 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1860/])
HDFS-4257. The ReplaceDatanodeOnFailure policies could have a forgiving option. 
 Contributed by szetszwo. (cmccabe: rev 
727331becc3902cb4e60ee04741e79703238e782)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/ReplaceDatanodeOnFailure.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120048#comment-14120048
 ] 

Hudson commented on HDFS-4257:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1885 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1885/])
HDFS-4257. The ReplaceDatanodeOnFailure policies could have a forgiving option. 
 Contributed by szetszwo. (cmccabe: rev 
727331becc3902cb4e60ee04741e79703238e782)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/ReplaceDatanodeOnFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-02 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118393#comment-14118393
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Hi [~szetszwo], thanks for the rev, it looks good! A few very minor comments:

1. Wonder if we can add a log right after calling 
{{this.dtpReplaceDatanodeOnFailure = ReplaceDatanodeOnFailure.get(conf);}}, to 
indicate what policy is used? My concern is, user may change policy between 
different sessions, it'd be nice to have a record in the log, so we can tell 
what policy is used. 

2. About method {{satisfy(...)}} in Condition interface, {{DEFAULT}} has 
final qualifier for all parameters, but the others don't. It'd be nice to be 
consistent. Having final is a good thing, to achieve both the benefit of 
final and code consistency.

3. The comments section and parameter specification for {{static final 
Condition DEFAULT = new Condition() {}} used names r, n and replication, 
nExistings in a mixed way. Can we use replication, nExistings  to be 
consistent with other places in the same file?

Thanks a lot.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118827#comment-14118827
 ] 

Colin Patrick McCabe commented on HDFS-4257:


Yongjun, I'm going to file a follow-up JIRA to address your comments.

+1, will commit shortly.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118895#comment-14118895
 ] 

Colin Patrick McCabe commented on HDFS-4257:


Yongjun, check out HDFS-6985 where I addressed your comments.  Thanks, all.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-09-02 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118949#comment-14118949
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Thanks Colin, for reviewing and following-up.

Hi [~szetszwo], thanks for fixing the problem here. Colin created HDFS-6985 to 
address the comments I made earlier. Would you please take a look whether it 
looks good to you when you have time?  Thanks.



 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116573#comment-14116573
 ] 

Tsz Wo Nicholas Sze commented on HDFS-4257:
---

 Can you add more JavaDoc for this class? For example, we should document what 
 the parameters are.

Since it is a private interface and the code is simple, I will skip adding 
javadoc.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116586#comment-14116586
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665583/h4257_20140831.patch
  against trunk revision 258c7d0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7863//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7863//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch, h4257_20140831.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-28 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114150#comment-14114150
 ] 

Yongjun Zhang commented on HDFS-4257:
-

HI Nicholas, I wonder if we could log a message to see what policy is used at 
initialization stage and whenever the policy is changed, so when we check the 
log, we can see what policy is being used? thanks.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112878#comment-14112878
 ] 

Colin Patrick McCabe commented on HDFS-4257:


It's been a more than a week since the last patch.  Nicholas, are you still 
working on this?

Do you have any objection to me giving this JIRA to Yongjun?  I think the 
changes needed to get this into shape to commit are minimal.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-27 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112927#comment-14112927
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Hi Nicholas, thanks for your earlier work on this. You have done all the work 
except to address some cosmetic change kind of comments from us. If you have 
time to finish the rest, that would be great, if not, I can post a revision, 
certainly without changing the assignee of this jira. Thanks.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-27 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113069#comment-14113069
 ] 

Tsz Wo Nicholas Sze commented on HDFS-4257:
---

Thanks for the review comments.  Will upload a new patch soon.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-21 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106213#comment-14106213
 ] 

Gregory Chanan commented on HDFS-4257:
--

Looking forward to this, it would definitely help us in Solr, similar to the 
description for Flume given in HDFS-5131.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102348#comment-14102348
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Hi Nicholas,

Thanks for the updated patch. I went through it and it looks good to me.  One 
small comment here, my understanding is, with best effort is enabled, and when 
there is only one replica currently being written, there is the potential of 
data loss if this replica DN also goes down (see Colin's comments above). If it 
makes sense to you, can we add a Warning to the hdfs-default,xml description to 
indicate the data loss possibility?

For the separate thread Colin proposed to repair the pipeline, - I discussed 
with Colin - it may help alleviate the situation, but for a slow writer, if the 
block does not get to finalized state for long time, then the possibility of 
data loss is still there. So we need to think more about how to do better 
(Colin, please correct me if I'm wrong).

Thanks.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102689#comment-14102689
 ] 

Colin Patrick McCabe commented on HDFS-4257:


Thanks for working on this again, Nicholas.

{code}
  namedfs.client.block.write.replace-datanode-on-failure.best-effort/name
  valuefalse/value
  description
Best effort means that the client will try to replace the failed datanode
(provided that the policy is satisfied), however, it will continue the
write operation in case that the datanode replacement also fails.

Suppose the datanode replacement fails.
false: An exception should be thrown so that the write will fail.
true : The write should be resumed with the remaining datandoes.
  /description
/property
{code}

This description doesn't mention write pipeline recovery.  We should make it 
clear here that this setting applies to pipeline recovery.  I agree with 
Yongjun that we should also probably mention that best effort means that the 
client may resume writing with a lower number of datanodes than configured, 
which may lead to data loss.

{code}
+if (DFSClient.LOG.isTraceEnabled()) {
+  DFSClient.LOG.trace(Failed to replace datanode, ioe);
+}
{code}

Failure to replace a datanode is a very serious issue.  This ought to be 
{{LOG.error}} or {{LOG.warn}}, not trace.  Also, should we mention that we are 
continuing only because best effort is configured?

{code}
+/** Is the condition satisfied? */
+public boolean satisfy(final short replication,
+final DatanodeInfo[] existings, final int n, final boolean isAppend,
+final boolean isHflushed);
{code}
Can you add more JavaDoc for this class?  For example, we should document what 
the parameters are.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102874#comment-14102874
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662703/h4257_20140819.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7681//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7681//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch, h4257_20140819.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101278#comment-14101278
 ] 

Colin Patrick McCabe commented on HDFS-4257:


I created a new issue for doing pipeline recovery in the background.  In the 
meantime, I think the approach Nicholas outlined here looks good.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101281#comment-14101281
 ] 

Colin Patrick McCabe commented on HDFS-4257:


Nicholas, the patch no longer applies.  Can you rebase it?  Or if you're busy, 
Yongjun has expressed some interest in working on this, using the approach in 
your patch.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101729#comment-14101729
 ] 

Tsz Wo Nicholas Sze commented on HDFS-4257:
---

[~yzhangal], thanks.  I will post a updated patch. 

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101742#comment-14101742
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Many thanks [~szetszwo]!

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-16 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099698#comment-14099698
 ] 

Tsz Wo Nicholas Sze commented on HDFS-4257:
---

What do you mean by heating up?

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099858#comment-14099858
 ] 

Yongjun Zhang commented on HDFS-4257:
-

Hi Nicholas, sorry I did not make it clear earlier. We recently diagnosed a 
problem and identified that solving this jira would help, so we hope to get a 
resolution for this jira as soon as possible, Thanks a lot for the earlier work 
you have done here!
 

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098779#comment-14098779
 ] 

Yongjun Zhang commented on HDFS-4257:
-

HI [~szetszwo], this issue is heating up now, I wonder if you  will have time 
to work on this soon? if not, I wonder if I can pick up from where you are? 
thanks a lot.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-07-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072325#comment-14072325
 ] 

Colin Patrick McCabe commented on HDFS-4257:


Nicholas, thank you for looking at this.  I can tell there have been a lot of 
JIRAs about this problem (HDFS-3091, HDFS-3179, HDFS-5131, and HDFS-4600 are 
all somewhat related).

The basic problem that seems to happen a lot is:
1. Client loses network connectivity
2. The client tries to write.  But because it can't see anyone else in the 
network, it can only write to 1 replica at most.
3. The pipeline recovery code throws a hard error because it can't get 3 
replicas.
4. Client gets a write error and tries to close the file.  That just gives 
another error.  The client goes into a bad state.  Sometimes the client 
continues trying to close the file and continues getting an exception (although 
this behavior was changed recently).  Due to HDFS-4504, the file never gets 
cleaned up on the NameNode if the client is long-lived.

HBase and Flume are both long-lived clients that have the problem with 
HDFS-4504.  HBase avoids this particular problem by not using the HDFS pipeline 
recovery code, but simply doing their own thing by checking the current number 
of replicas.  So they never get to step #3 because the pipeline recovery is 
turned off.  For Flume, though, this is a major problem.

The approach in this patch seems to be that instead of throwing a hard error in 
step #3, the DFSClient should simply accept only having 1 replica.  This will 
certainly fix the problem for Flume.  But imagine the following scenario:
1. Client loses network connectivity
2. The client tries to write.  But because it can't see anyone else in the 
network, it can only write to 1 replica at most.
3. The pipeline recovery code accepts only using 1 local replica
4. The client gets network connectivity back
5. A long time passes
6. The hard disks on the client node go down.

In this scenario, we lose the data after step #6.  The problem is that while 
the latest replica is under construction, we won't try to replicate it to other 
nodes, even though the network is back.

If we had a background thread that tried to repair the pipeline in step #5, we 
could avoid this problem.  Another possibility is that instead of throwing an 
error or continuing in step #3, we could simply wait for a configurable period 
(after logging a message).

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-07-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072342#comment-14072342
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637004/h4257_20140326.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7446//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947595#comment-13947595
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636854/h4257_20140325.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/6508//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6508//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6508//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947788#comment-13947788
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636877/h4257_20140325b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6512//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6512//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948730#comment-13948730
 ] 

Hadoop QA commented on HDFS-4257:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637004/h4257_20140326.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6520//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6520//console

This message is automatically generated.

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2012-12-01 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508143#comment-13508143
 ] 

Harsh J commented on HDFS-4257:
---

Thanks for the comment Nicholas!

I feel having an policy-option 'TOLERATE' (or similar) would be much cleaner 
than a direct string toggle, if it is possible to implement it this way, 
thoughts?

Would we also be making it default (to go back to older behavior), or be 
continuing with 'DEFAULT'?

 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor

 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira