[jira] [Commented] (HDFS-4140) fuse-dfs handles open(O_TRUNC) poorly

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532152#comment-13532152
 ] 

Hadoop QA commented on HDFS-4140:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560910/HDFS-4140.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestPersistBlocks
  
org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3664//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3664//console

This message is automatically generated.

 fuse-dfs handles open(O_TRUNC) poorly
 -

 Key: HDFS-4140
 URL: https://issues.apache.org/jira/browse/HDFS-4140
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs
Affects Versions: 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4140.003.patch, HDFS-4140.004.patch, 
 HDFS-4140.005.patch, HDFS-4140.006.patch, HDFS-4140.007.patch, 
 HDFS-4140.008.patch


 fuse-dfs handles open(O_TRUNC) poorly.
 It is converted to multiple fuse operations.  Those multiple fuse operations 
 often fail (for example, calling fuse_truncate_impl() while a file is also 
 open for write results in a multiple writers! exception.)
 One easy way to see the problem is to run the following sequence of shell 
 commands:
 {noformat}
 ubuntu@ubu-cdh-0:~$ echo foo  /export/hdfs/tmp/a/t1.txt
 ubuntu@ubu-cdh-0:~$ ls -l /export/hdfs/tmp/a
 total 0
 -rw-r--r-- 1 ubuntu hadoop 4 Nov  1 15:21 t1.txt
 ubuntu@ubu-cdh-0:~$ hdfs dfs -ls /tmp/a
 Found 1 items
 -rw-r--r--   3 ubuntu hadoop  4 2012-11-01 15:21 /tmp/a/t1.txt
 ubuntu@ubu-cdh-0:~$ echo bar  /export/hdfs/tmp/a/t1.txt
 ubuntu@ubu-cdh-0:~$ ls -l /export/hdfs/tmp/a
 total 0
 -rw-r--r-- 1 ubuntu hadoop 0 Nov  1 15:22 t1.txt
 ubuntu@ubu-cdh-0:~$ hdfs dfs -ls /tmp/a
 Found 1 items
 -rw-r--r--   3 ubuntu hadoop  0 2012-11-01 15:22 /tmp/a/t1.txt
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

2012-12-14 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532161#comment-13532161
 ] 

liang xie commented on HDFS-3429:
-

still no obvious difference be found at another 100%read scenario withou 
IO-bound

i did strace -p DN pid -f -tt -T -e trace=file -o bbb during a several 
minutes run(without patch),then:
grep current/finalized bbb|wc -l
16905
grep meta bbb|wc -l
9858
grep meta bbb|grep open|wc -l
3286
grep meta bbb|grep stat|wc -l
6572
grep meta bbb|grep \.*\ -o|sort -n |uniq -c|wc -l
303
And most of those meta files size are several hundred of kilobytes, further 
more, our OS has a default read_ahead_kb: 128
so the benefit was not obvious seems make sense as well. Any idea, [~tlipcon] ?

But i am +1 for this patch, due to it can reduce some unnecessary IO  system 
call

 DataNode reads checksums even if client does not need them
 --

 Key: HDFS-3429
 URL: https://issues.apache.org/jira/browse/HDFS-3429
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3429-0.20.2.patch, hdfs-3429-0.20.2.patch, 
 hdfs-3429.txt, hdfs-3429.txt, hdfs-3429.txt


 Currently, even if the client does not want to verify checksums, the datanode 
 reads them anyway and sends them over the wire. This means that performance 
 improvements like HBase's application-level checksums don't have much benefit 
 when reading through the datanode, since the DN is still causing seeks into 
 the checksum file.
 (Credit goes to Dhruba for discovering this - filing on his behalf)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (HDFS-4311) repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky moved HADOOP-9143 to HDFS-4311:
--

Key: HDFS-4311  (was: HADOOP-9143)
Project: Hadoop HDFS  (was: Hadoop Common)

 repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos
 ---

 Key: HDFS-4311
 URL: https://issues.apache.org/jira/browse/HDFS-4311
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky

 Some of the test cases in this test class are failing because they are 
 affected by static state changed by the previous test cases. Namely this is 
 the static field org.apache.hadoop.security.UserGroupInformation.loginUser .
 The suggested patch solves this problem.
 Besides, the following improvements are done:
 1) parametrized the user principal and keytab values via system properties;
 2) shutdown of the Jetty server and the minicluster between the test cases is 
 added to make the test methods independent on each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4311) repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HDFS-4311:
-

Attachment: HDFS-4311.patch

 repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos
 ---

 Key: HDFS-4311
 URL: https://issues.apache.org/jira/browse/HDFS-4311
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4311.patch


 Some of the test cases in this test class are failing because they are 
 affected by static state changed by the previous test cases. Namely this is 
 the static field org.apache.hadoop.security.UserGroupInformation.loginUser .
 The suggested patch solves this problem.
 Besides, the following improvements are done:
 1) parametrized the user principal and keytab values via system properties;
 2) shutdown of the Jetty server and the minicluster between the test cases is 
 added to make the test methods independent on each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4311) repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HDFS-4311:
-

Affects Version/s: 2.0.3-alpha
   3.0.0
   Status: Patch Available  (was: Open)

 repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos
 ---

 Key: HDFS-4311
 URL: https://issues.apache.org/jira/browse/HDFS-4311
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4311.patch


 Some of the test cases in this test class are failing because they are 
 affected by static state changed by the previous test cases. Namely this is 
 the static field org.apache.hadoop.security.UserGroupInformation.loginUser .
 The suggested patch solves this problem.
 Besides, the following improvements are done:
 1) parametrized the user principal and keytab values via system properties;
 2) shutdown of the Jetty server and the minicluster between the test cases is 
 added to make the test methods independent on each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4311) repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532195#comment-13532195
 ] 

Hadoop QA commented on HDFS-4311:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560935/HDFS-4311.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3665//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3665//console

This message is automatically generated.

 repair test org.apache.hadoop.fs.http.server.TestHttpFSWithKerberos
 ---

 Key: HDFS-4311
 URL: https://issues.apache.org/jira/browse/HDFS-4311
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4311.patch


 Some of the test cases in this test class are failing because they are 
 affected by static state changed by the previous test cases. Namely this is 
 the static field org.apache.hadoop.security.UserGroupInformation.loginUser .
 The suggested patch solves this problem.
 Besides, the following improvements are done:
 1) parametrized the user principal and keytab values via system properties;
 2) shutdown of the Jetty server and the minicluster between the test cases is 
 added to make the test methods independent on each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4307) SocketCache should use monotonic time

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532229#comment-13532229
 ] 

Hudson commented on HDFS-4307:
--

Integrated in Hadoop-Yarn-trunk #65 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/65/])
HDFS-4307. SocketCache should use monotonic time. Contributed by Colin 
Patrick McCabe. (Revision 1421572)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421572
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/SocketCache.java


 SocketCache should use monotonic time
 -

 Key: HDFS-4307
 URL: https://issues.apache.org/jira/browse/HDFS-4307
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HDFS-4307.001.patch, HDFS-4307.002.patch


 {{SocketCache}} should use monotonic time, not wall-clock time.  Otherwise, 
 if the time is adjusted by ntpd or a system administrator, sockets could be 
 either abrupbtly expired, or left in the cache indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4310) fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532232#comment-13532232
 ] 

Hudson commented on HDFS-4310:
--

Integrated in Hadoop-Yarn-trunk #65 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/65/])
HDFS-4310. fix test 
org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode Contributed by 
Ivan A. Veselovsky. (Revision 1421560)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421560
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode.java


 fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode
 ---

 Key: HDFS-4310
 URL: https://issues.apache.org/jira/browse/HDFS-4310
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Fix For: 3.0.0

 Attachments: HDFS-4310.patch


 the test org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode 
 catches exceptions and does not re-throw them. Due to that it passes even if 
 it actually failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4307) SocketCache should use monotonic time

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532286#comment-13532286
 ] 

Hudson commented on HDFS-4307:
--

Integrated in Hadoop-Hdfs-trunk #1254 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1254/])
HDFS-4307. SocketCache should use monotonic time. Contributed by Colin 
Patrick McCabe. (Revision 1421572)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421572
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/SocketCache.java


 SocketCache should use monotonic time
 -

 Key: HDFS-4307
 URL: https://issues.apache.org/jira/browse/HDFS-4307
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HDFS-4307.001.patch, HDFS-4307.002.patch


 {{SocketCache}} should use monotonic time, not wall-clock time.  Otherwise, 
 if the time is adjusted by ntpd or a system administrator, sockets could be 
 either abrupbtly expired, or left in the cache indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4310) fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532289#comment-13532289
 ] 

Hudson commented on HDFS-4310:
--

Integrated in Hadoop-Hdfs-trunk #1254 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1254/])
HDFS-4310. fix test 
org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode Contributed by 
Ivan A. Veselovsky. (Revision 1421560)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421560
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode.java


 fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode
 ---

 Key: HDFS-4310
 URL: https://issues.apache.org/jira/browse/HDFS-4310
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Fix For: 3.0.0

 Attachments: HDFS-4310.patch


 the test org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode 
 catches exceptions and does not re-throw them. Due to that it passes even if 
 it actually failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-14 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532291#comment-13532291
 ] 

ChenFolin commented on HDFS-4309:
-

Hi Aaron T. Myers,
When I execute dev-support/test-patch.sh patch,that causes many errors,such 
as:
org.apache.hadoop.record.RecordComparator is deprecated.
and the code is:
{code}
@Deprecated
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class RecordComparator extends WritableComparator {
{code}

So,dev-support/test-patch.sh patch exec failed.And now,how can I do for it?

==
==
Determining number of patched javac warnings.
==
==


mvn clean test -DskipTests -DHadoopPatchProcess -Pnative -Ptest-patch  
/tmp/patchJavacWarnings.txt 21




{color:red}-1 overall{color}.  

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.




==
==
Finished build.
==
==

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 {code}
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4312) fix test TestSecureNameNode and improve test TestSecureNameNode

2012-12-14 Thread Ivan A. Veselovsky (JIRA)
Ivan A. Veselovsky created HDFS-4312:


 Summary: fix test TestSecureNameNode and improve test 
TestSecureNameNode
 Key: HDFS-4312
 URL: https://issues.apache.org/jira/browse/HDFS-4312
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky


TestSecureNameNode does not work on Java6 without 
dfs.web.authentication.kerberos.principal config property set.

Also the following improved:
1) keytab files are checked for existence and readability to provide fast-fail 
on config error.
2) added comment to TestSecureNameNode describing the required sys props.
3) string literals replaced with config constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4312) fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HDFS-4312:
-

Summary: fix test TestSecureNameNode and improve test 
TestSecureNameNodeWithExternalKdc  (was: fix test TestSecureNameNode and 
improve test TestSecureNameNode)

 fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc
 --

 Key: HDFS-4312
 URL: https://issues.apache.org/jira/browse/HDFS-4312
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky

 TestSecureNameNode does not work on Java6 without 
 dfs.web.authentication.kerberos.principal config property set.
 Also the following improved:
 1) keytab files are checked for existence and readability to provide 
 fast-fail on config error.
 2) added comment to TestSecureNameNode describing the required sys props.
 3) string literals replaced with config constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4312) fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HDFS-4312:
-

Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc
 --

 Key: HDFS-4312
 URL: https://issues.apache.org/jira/browse/HDFS-4312
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4312.patch


 TestSecureNameNode does not work on Java6 without 
 dfs.web.authentication.kerberos.principal config property set.
 Also the following improved:
 1) keytab files are checked for existence and readability to provide 
 fast-fail on config error.
 2) added comment to TestSecureNameNode describing the required sys props.
 3) string literals replaced with config constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4312) fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc

2012-12-14 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HDFS-4312:
-

Attachment: HDFS-4312.patch

 fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc
 --

 Key: HDFS-4312
 URL: https://issues.apache.org/jira/browse/HDFS-4312
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4312.patch


 TestSecureNameNode does not work on Java6 without 
 dfs.web.authentication.kerberos.principal config property set.
 Also the following improved:
 1) keytab files are checked for existence and readability to provide 
 fast-fail on config error.
 2) added comment to TestSecureNameNode describing the required sys props.
 3) string literals replaced with config constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4307) SocketCache should use monotonic time

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532313#comment-13532313
 ] 

Hudson commented on HDFS-4307:
--

Integrated in Hadoop-Mapreduce-trunk #1285 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1285/])
HDFS-4307. SocketCache should use monotonic time. Contributed by Colin 
Patrick McCabe. (Revision 1421572)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421572
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/SocketCache.java


 SocketCache should use monotonic time
 -

 Key: HDFS-4307
 URL: https://issues.apache.org/jira/browse/HDFS-4307
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HDFS-4307.001.patch, HDFS-4307.002.patch


 {{SocketCache}} should use monotonic time, not wall-clock time.  Otherwise, 
 if the time is adjusted by ntpd or a system administrator, sockets could be 
 either abrupbtly expired, or left in the cache indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4310) fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode

2012-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532316#comment-13532316
 ] 

Hudson commented on HDFS-4310:
--

Integrated in Hadoop-Mapreduce-trunk #1285 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1285/])
HDFS-4310. fix test 
org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode Contributed by 
Ivan A. Veselovsky. (Revision 1421560)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1421560
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode.java


 fix test org.apache.hadoop.hdfs.server.datanode.TestStartSecureDataNode
 ---

 Key: HDFS-4310
 URL: https://issues.apache.org/jira/browse/HDFS-4310
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Fix For: 3.0.0

 Attachments: HDFS-4310.patch


 the test org/apache/hadoop/hdfs/server/datanode/TestStartSecureDataNode 
 catches exceptions and does not re-throw them. Due to that it passes even if 
 it actually failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4312) fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532388#comment-13532388
 ] 

Hadoop QA commented on HDFS-4312:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560969/HDFS-4312.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3666//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3666//console

This message is automatically generated.

 fix test TestSecureNameNode and improve test TestSecureNameNodeWithExternalKdc
 --

 Key: HDFS-4312
 URL: https://issues.apache.org/jira/browse/HDFS-4312
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
 Attachments: HDFS-4312.patch


 TestSecureNameNode does not work on Java6 without 
 dfs.web.authentication.kerberos.principal config property set.
 Also the following improved:
 1) keytab files are checked for existence and readability to provide 
 fast-fail on config error.
 2) added comment to TestSecureNameNode describing the required sys props.
 3) string literals replaced with config constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Jeremy Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532457#comment-13532457
 ] 

Jeremy Carroll commented on HDFS-3912:
--

FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for 
branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, 
etc..

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Jeremy Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532482#comment-13532482
 ] 

Jeremy Carroll commented on HDFS-3912:
--

Basically this patch requires HDFS-3601 (Version 3.0). So there is no Branch 
2.0 patch on the ticket.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532488#comment-13532488
 ] 

nkeywal commented on HDFS-3912:
---

Are you sure? It's committed in branch-1?

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532493#comment-13532493
 ] 

Harsh J commented on HDFS-3912:
---

bq. FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for 
branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, 
etc..

The diff may be dependent on the JIRA you mention, but perhaps not the patch 
itself. We merged the trunk commit directly into branch-2, as 
viewable/downloadable here: view at 
http://svn.apache.org/viewvc?view=revisionrevision=1397219 and download at 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java?revision=1397219view=co

If you use git locally, you can also add a remote and cherry-pick it out I 
guess.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532495#comment-13532495
 ] 

Harsh J commented on HDFS-3912:
---

bq. Are you sure? It's committed in branch-1?

Yes, branch-1 has this as a backport commit, whose different patch is attached 
as well.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4313) MiniDFSCluster throws NPE if umask is more permissive than 022

2012-12-14 Thread Luke Lu (JIRA)
Luke Lu created HDFS-4313:
-

 Summary: MiniDFSCluster throws NPE if umask is more permissive 
than 022
 Key: HDFS-4313
 URL: https://issues.apache.org/jira/browse/HDFS-4313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.1
Reporter: Luke Lu
Priority: Minor


MiniDFSCluster startup throws NPE if umask is more permissive e.g. 002 than 022.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3465) 2NN doesn't start with fs.defaultFS set to a viewfs URI unless service RPC address is also set

2012-12-14 Thread Joseph Kniest (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532686#comment-13532686
 ] 

Joseph Kniest commented on HDFS-3465:
-

Hi, I am new to HDFS dev and I would like to take this issue as my first. It 
may take a while because it's my first issue and because of my schedule but I 
will do my best to be as prompt as possible. Thanks!

 2NN doesn't start with fs.defaultFS set to a viewfs URI unless service RPC 
 address is also set
 --

 Key: HDFS-3465
 URL: https://issues.apache.org/jira/browse/HDFS-3465
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: federation, namenode
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
  Labels: newbie

 Looks like the 2NN first tries servicerpc-address then falls back on 
 fs.defaultFS, which won't work in the case of federation since fs.defaultFS 
 doesn't refer to an RPC address. Instead, the 2NN should first check 
 servicerpc-address, then rpc-address, then fall back on fs.defaultFS.
 {noformat}
 Exception in thread main java.lang.IllegalArgumentException: Invalid
 URI for NameNode address (check fs.defaultFS): viewfs:/// has no
 authority.
at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:315)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:303)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:296)
at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214)
at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:178)
at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:582)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

2012-12-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532724#comment-13532724
 ] 

Todd Lipcon commented on HDFS-3429:
---

Hi Liang. I'm not sure if 0.94.2 has the code right to take advantage of this 
new feature quite yet -- given you see a bunch of the .meta files being read, 
it seems like it doesn't. So, that would explain why you don't see a 
performance difference.

 DataNode reads checksums even if client does not need them
 --

 Key: HDFS-3429
 URL: https://issues.apache.org/jira/browse/HDFS-3429
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3429-0.20.2.patch, hdfs-3429-0.20.2.patch, 
 hdfs-3429.txt, hdfs-3429.txt, hdfs-3429.txt


 Currently, even if the client does not want to verify checksums, the datanode 
 reads them anyway and sends them over the wire. This means that performance 
 improvements like HBase's application-level checksums don't have much benefit 
 when reading through the datanode, since the DN is still causing seeks into 
 the checksum file.
 (Credit goes to Dhruba for discovering this - filing on his behalf)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4253) block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance

2012-12-14 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532779#comment-13532779
 ] 

Andy Isaacson commented on HDFS-4253:
-

bq. The bug comes later, where you always return 1 if neither Node is on the 
local rack. This is wrong; it violates anticommutation (see link).

But that's not what the code does.  If neither Node is on the local rack, then 
{{aIsLocalRack == bIsLocalRack}} and we use the shuffle for a total ordering, 
right here:
{code}
858 if (aIsLocalRack == bIsLocalRack) {
859   int ai = shuffle.get(a), bi = shuffle.get(b);
860   if (ai  bi) {
861 return -1;
862   } else if (ai  bi) {
863 return 1;
864   } else {
865 return 0;
866   }
{code}
The final {{else}} is only reached when {{bIsLocalRack  !aIsLocalRack}}. So 
I'm pretty sure this implementation does satisfy anticommutation.

 block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
 -

 Key: HDFS-4253
 URL: https://issues.apache.org/jira/browse/HDFS-4253
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs4253-1.txt, hdfs4253.txt


 When many nodes (10) read from the same block simultaneously, we get 
 asymmetric distribution of read load.  This can result in slow block reads 
 when one replica is serving most of the readers and the other replicas are 
 idle.  The busy DN bottlenecks on its network link.
 This is especially visible with large block sizes and high replica counts (I 
 reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
 5), but the same behavior happens on a small scale with normal-sized blocks 
 and replication=3.
 The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
 explicitly does not try to spread traffic among replicas in a given rack -- 
 it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4253) block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance

2012-12-14 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4253:


Attachment: hdfs4253-2.txt

Avoid extra a.equals(b) by checking {{aIsLocal  bIsLocal}} instead.  On 
average for a given sort this will result in fewer calls to {{.equals()}}.

 block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
 -

 Key: HDFS-4253
 URL: https://issues.apache.org/jira/browse/HDFS-4253
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs4253-1.txt, hdfs4253-2.txt, hdfs4253.txt


 When many nodes (10) read from the same block simultaneously, we get 
 asymmetric distribution of read load.  This can result in slow block reads 
 when one replica is serving most of the readers and the other replicas are 
 idle.  The busy DN bottlenecks on its network link.
 This is especially visible with large block sizes and high replica counts (I 
 reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
 5), but the same behavior happens on a small scale with normal-sized blocks 
 and replication=3.
 The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
 explicitly does not try to spread traffic among replicas in a given rack -- 
 it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4314) failure to set sticky bit regression on branch-trunk-win

2012-12-14 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-4314:
---

 Summary: failure to set sticky bit regression on branch-trunk-win
 Key: HDFS-4314
 URL: https://issues.apache.org/jira/browse/HDFS-4314
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: trunk-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: trunk-win


The problem is visible by running {{TestDFSShell#testFilePermissions}}.  The 
test fails on trying to set sticky bit.  The problem is that branch-trunk-win 
accidentally merged in a branch-1 change in 
{{RawLocalFileSystem#setPermission}} to call {{FileUtil#setPermission}}, which 
sets permissions using Java {{File}} API.  There is no way to set sticky bit 
through this API.  We need to switch back to the trunk implementation of 
{{RawLocalFileSystem#setPermission}}, which uses either native code or a shell 
call to external chmod.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4315:


 Summary: DNs with multiple BPs can have BPOfferServices fail to 
start due to unsynchronized map access
 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In some nightly test runs we've seen pretty frequent failures of 
TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
unsynchronized map access in the DataStorage class.

More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532857#comment-13532857
 ] 

Aaron T. Myers commented on HDFS-4315:
--

In all of the failing test runs that I saw, the client would end up failing 
with an error like the following:

{noformat}
2012-12-14 16:30:36,818 WARN  hdfs.DFSClient (DFSOutputStream.java:run(562)) - 
DataStreamer Exception
java.io.IOException: Failed to add a datanode.  User may turn off this feature 
by setting dfs.client.block.write.replace-datanode-on-failure.policy in 
configuration, where the current policy is DEFAULT.  (Nodes: 
current=[127.0.0.1:52552, 127.0.0.1:43557], original=[127.0.0.1:43557, 
127.0.0.1:52552])
  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:792)
  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:852)
  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:958)
 
  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:469)
{noformat}

This suggests that either an entire DN or one of the BPOfferServices of one of 
the DNs was not starting correctly, or had not started by the time the client 
was trying to access it. Unfortunately, TestWebHdfsWithMultipleNameNodes 
disables the DN logger, so it wasn't obvious what was causing that problem. 
Upon changing the test to not disable the logger and looping the test, I would 
occasionally see an error like the following:

{noformat}
java.lang.NullPointerException
  at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:850)
  at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:819)
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
  at java.lang.Thread.run(Thread.java:662)
{noformat}

This error would cause one of the BPOfferServices in one of the DNs to not come 
up. The reason for this is that concurrent, unsynchronized puts to the HashMap 
DataStorage#bpStorageMap results in undefined behavior, including 
previously-included entries no longer appearing to be in the map.

 DNs with multiple BPs can have BPOfferServices fail to start due to 
 unsynchronized map access
 -

 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 In some nightly test runs we've seen pretty frequent failures of 
 TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
 unsynchronized map access in the DataStorage class.
 More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4315:
-

Attachment: HDFS-4315.patch

Here's a patch which addresses the issue. I've been looping the test for an 
hour now with no failures, whereas before it used to fail pretty reliably 
within 10 minutes. I'll keep it looping over the weekend and see how it goes.

This patch also takes the liberty of re-enabling the DN log in 
TestWebHdfsWithMultipleNameNodes, so that we can better see the root cause of 
later failures.

 DNs with multiple BPs can have BPOfferServices fail to start due to 
 unsynchronized map access
 -

 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-4315.patch


 In some nightly test runs we've seen pretty frequent failures of 
 TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
 unsynchronized map access in the DataStorage class.
 More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4315:
-

Status: Patch Available  (was: Open)

 DNs with multiple BPs can have BPOfferServices fail to start due to 
 unsynchronized map access
 -

 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-4315.patch


 In some nightly test runs we've seen pretty frequent failures of 
 TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
 unsynchronized map access in the DataStorage class.
 More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532860#comment-13532860
 ] 

Eli Collins commented on HDFS-4315:
---

Nice find!

+1 pending jenkins


 DNs with multiple BPs can have BPOfferServices fail to start due to 
 unsynchronized map access
 -

 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-4315.patch


 In some nightly test runs we've seen pretty frequent failures of 
 TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
 unsynchronized map access in the DataStorage class.
 More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4253) block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance

2012-12-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532864#comment-13532864
 ] 

Colin Patrick McCabe commented on HDFS-4253:


Thanks for clarifying that.  I still think there's a problem, though-- I don't 
see any reason why shuffle(a) could not be equal to shuffle(b), for two 
completely unrelated DatanodeIDs a and b.  This could be fixed by checking 
something that's supposed to be unique in the case where the two agree-- like 
the name field.  It also seems better to just use {{hashCode}}, rather than 
creating your own random set of random ints associated with objects.

 block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
 -

 Key: HDFS-4253
 URL: https://issues.apache.org/jira/browse/HDFS-4253
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs4253-1.txt, hdfs4253-2.txt, hdfs4253.txt


 When many nodes (10) read from the same block simultaneously, we get 
 asymmetric distribution of read load.  This can result in slow block reads 
 when one replica is serving most of the readers and the other replicas are 
 idle.  The busy DN bottlenecks on its network link.
 This is especially visible with large block sizes and high replica counts (I 
 reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
 5), but the same behavior happens on a small scale with normal-sized blocks 
 and replication=3.
 The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
 explicitly does not try to spread traffic among replicas in a given rack -- 
 it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

2012-12-14 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532891#comment-13532891
 ] 

liang xie commented on HDFS-3429:
-

O, [~tlipcon], you missed my words: without patch

the strace showed the statistic without patch.  
After applied the patch, i could not see so much meta files be opened

 DataNode reads checksums even if client does not need them
 --

 Key: HDFS-3429
 URL: https://issues.apache.org/jira/browse/HDFS-3429
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3429-0.20.2.patch, hdfs-3429-0.20.2.patch, 
 hdfs-3429.txt, hdfs-3429.txt, hdfs-3429.txt


 Currently, even if the client does not want to verify checksums, the datanode 
 reads them anyway and sends them over the wire. This means that performance 
 improvements like HBase's application-level checksums don't have much benefit 
 when reading through the datanode, since the DN is still causing seeks into 
 the checksum file.
 (Credit goes to Dhruba for discovering this - filing on his behalf)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

2012-12-14 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532895#comment-13532895
 ] 

liang xie commented on HDFS-3429:
-

and the hbase-secific issue is :  HBASE-5074 , fixed at 0.94.0

 DataNode reads checksums even if client does not need them
 --

 Key: HDFS-3429
 URL: https://issues.apache.org/jira/browse/HDFS-3429
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3429-0.20.2.patch, hdfs-3429-0.20.2.patch, 
 hdfs-3429.txt, hdfs-3429.txt, hdfs-3429.txt


 Currently, even if the client does not want to verify checksums, the datanode 
 reads them anyway and sends them over the wire. This means that performance 
 improvements like HBase's application-level checksums don't have much benefit 
 when reading through the datanode, since the DN is still causing seeks into 
 the checksum file.
 (Credit goes to Dhruba for discovering this - filing on his behalf)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4253) block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532896#comment-13532896
 ] 

Hadoop QA commented on HDFS-4253:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561062/hdfs4253-2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3667//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3667//console

This message is automatically generated.

 block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
 -

 Key: HDFS-4253
 URL: https://issues.apache.org/jira/browse/HDFS-4253
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs4253-1.txt, hdfs4253-2.txt, hdfs4253.txt


 When many nodes (10) read from the same block simultaneously, we get 
 asymmetric distribution of read load.  This can result in slow block reads 
 when one replica is serving most of the readers and the other replicas are 
 idle.  The busy DN bottlenecks on its network link.
 This is especially visible with large block sizes and high replica counts (I 
 reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
 5), but the same behavior happens on a small scale with normal-sized blocks 
 and replication=3.
 The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
 explicitly does not try to spread traffic among replicas in a given rack -- 
 it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4253) block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance

2012-12-14 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532901#comment-13532901
 ] 

Andy Isaacson commented on HDFS-4253:
-

bq. I don't see any reason why shuffle(a) could not be equal to shuffle(b), for 
two completely unrelated DatanodeIDs a and b.

That's true, equality is possible.  It's very unlikely given that we're 
choosing N items (where N is the replication count of a block, so nearly always 
3, sometimes 10, possibly as absurdly high as 50) from the range of 
{{Random#NextInt}} which is about 2**32.  The algorithm does something 
reasonable in the case that the shuffle has a collision (it puts the items in 
some order, either stable or not, and either result is fine for the rest of the 
algorithm). It would be possible to remove the possibility of collisions, but I 
don't know how to do that quickly with minimal code.  So the current 
implementation seemed to strike a nice balance between the desired behavior, 
efficient and easily understandable code, and low algorithmic complexity.

bq. It also seems better to just use hashCode, rather than creating your own 
random set of random ints associated with objects.

It's important that we get a different answer each time 
{{pseudoSortByDistance}} is invoked; that randomization is what spreads the 
read load out across the replicas. So using a stable value like hashCode would 
defeat that goal of this change.  (Possibly it might be true that hashCode 
ordering would be different in different {{DFSClient}} instances on different 
nodes, but I see no guarantee of that, and even if it's true, depending on such 
a subtle implementation detail would be dangerous. And it still doesn't resolve 
the issue that a single DFSClient should pick different replicas from a given 
class, for various reads of a given block.)

 block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
 -

 Key: HDFS-4253
 URL: https://issues.apache.org/jira/browse/HDFS-4253
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs4253-1.txt, hdfs4253-2.txt, hdfs4253.txt


 When many nodes (10) read from the same block simultaneously, we get 
 asymmetric distribution of read load.  This can result in slow block reads 
 when one replica is serving most of the readers and the other replicas are 
 idle.  The busy DN bottlenecks on its network link.
 This is especially visible with large block sizes and high replica counts (I 
 reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
 5), but the same behavior happens on a small scale with normal-sized blocks 
 and replication=3.
 The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
 explicitly does not try to spread traffic among replicas in a given rack -- 
 it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2012-12-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-347:
--

Attachment: HDFS-347.027.patch

This doesn't address all the points in the reviewboard (still working on 
another rev which does.)  However it does have the path security validation, 
the addition of {{dfs.client.domain.socket.data.traffic}}, some refactoring of 
BlockReaderFactory and the addition of DomainSocketFactory, and renaming of 
{{getBindPath}} to {{getBoundPath}}.

 DFS read performance suboptimal when client co-located on nodes with data
 -

 Key: HDFS-347
 URL: https://issues.apache.org/jira/browse/HDFS-347
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, performance
Reporter: George Porter
Assignee: Colin Patrick McCabe
 Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
 HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch, 
 HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch, 
 HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch, 
 HDFS-347.020.patch, HDFS-347.021.patch, HDFS-347.022.patch, 
 HDFS-347.024.patch, HDFS-347.025.patch, HDFS-347.026.patch, 
 HDFS-347.027.patch, HDFS-347-branch-20-append.txt, hdfs-347.png, 
 hdfs-347.txt, local-reads-doc


 One of the major strategies Hadoop uses to get scalable data processing is to 
 move the code to the data.  However, putting the DFS client on the same 
 physical node as the data blocks it acts on doesn't improve read performance 
 as much as expected.
 After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
 is due to the HDFS streaming protocol causing many more read I/O operations 
 (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
 disk block from the DataNode process (running in a separate JVM) running on 
 the same machine.  The DataNode will satisfy the single disk block request by 
 sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
 this is done in the sendChunk() method, relying on Java's transferTo() 
 method.  Depending on the host O/S and JVM implementation, transferTo() is 
 implemented as either a sendfilev() syscall or a pair of mmap() and write().  
 In either case, each chunk is read from the disk by issuing a separate I/O 
 operation for each chunk.  The result is that the single request for a 64-MB 
 block ends up hitting the disk as over a thousand smaller requests for 64-KB 
 each.
 Since the DFSClient runs in a different JVM and process than the DataNode, 
 shuttling data from the disk to the DFSClient also results in context 
 switches each time network packets get sent (in this case, the 64-kb chunk 
 turns into a large number of 1500 byte packet send operations).  Thus we see 
 a large number of context switches for each block send operation.
 I'd like to get some feedback on the best way to address this, but I think 
 providing a mechanism for a DFSClient to directly open data blocks that 
 happen to be on the same machine.  It could do this by examining the set of 
 LocatedBlocks returned by the NameNode, marking those that should be resident 
 on the local host.  Since the DataNode and DFSClient (probably) share the 
 same hadoop configuration, the DFSClient should be able to find the files 
 holding the block data, and it could directly open them and send data back to 
 the client.  This would avoid the context switches imposed by the network 
 layer, and would allow for much larger read buffers than 64KB, which should 
 reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532928#comment-13532928
 ] 

Hadoop QA commented on HDFS-4315:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561073/HDFS-4315.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3668//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3668//console

This message is automatically generated.

 DNs with multiple BPs can have BPOfferServices fail to start due to 
 unsynchronized map access
 -

 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-4315.patch


 In some nightly test runs we've seen pretty frequent failures of 
 TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
 unsynchronized map access in the DataStorage class.
 More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532938#comment-13532938
 ] 

Hadoop QA commented on HDFS-347:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561092/HDFS-347.027.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 2013 javac 
compiler warnings (more than the trunk's current 2012 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3669//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3669//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3669//console

This message is automatically generated.

 DFS read performance suboptimal when client co-located on nodes with data
 -

 Key: HDFS-347
 URL: https://issues.apache.org/jira/browse/HDFS-347
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, performance
Reporter: George Porter
Assignee: Colin Patrick McCabe
 Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
 HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch, 
 HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch, 
 HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch, 
 HDFS-347.020.patch, HDFS-347.021.patch, HDFS-347.022.patch, 
 HDFS-347.024.patch, HDFS-347.025.patch, HDFS-347.026.patch, 
 HDFS-347.027.patch, HDFS-347-branch-20-append.txt, hdfs-347.png, 
 hdfs-347.txt, local-reads-doc


 One of the major strategies Hadoop uses to get scalable data processing is to 
 move the code to the data.  However, putting the DFS client on the same 
 physical node as the data blocks it acts on doesn't improve read performance 
 as much as expected.
 After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
 is due to the HDFS streaming protocol causing many more read I/O operations 
 (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
 disk block from the DataNode process (running in a separate JVM) running on 
 the same machine.  The DataNode will satisfy the single disk block request by 
 sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
 this is done in the sendChunk() method, relying on Java's transferTo() 
 method.  Depending on the host O/S and JVM implementation, transferTo() is 
 implemented as either a sendfilev() syscall or a pair of mmap() and write().  
 In either case, each chunk is read from the disk by issuing a separate I/O 
 operation for each chunk.  The result is that the single request for a 64-MB 
 block ends up hitting the disk as over a thousand smaller requests for 64-KB 
 each.
 Since the DFSClient runs in a different JVM and process than the DataNode, 
 shuttling data from the disk to the DFSClient also results in context 
 switches each time network packets get sent (in this case, the 64-kb chunk 
 turns into a large number of 1500 byte packet send operations).  Thus we see 
 a large number of context switches for each block send operation.
 I'd like to get some feedback on the best way to address this, but I think 
 providing a mechanism for a DFSClient to directly open data blocks that 
 happen to be on the same machine.  It could do this by examining the set of 
 LocatedBlocks returned by the NameNode, marking those that should be resident 
 on the local host.  Since the DataNode and DFSClient (probably) share the 
 same hadoop configuration, the DFSClient should be able to find the files 
 holding the block data, and it could directly open them and send data back to 
 the client.  This would avoid the context switches imposed by the network 

[jira] [Assigned] (HDFS-4227) Document dfs.namenode.resource.*

2012-12-14 Thread Daisuke Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daisuke Kobayashi reassigned HDFS-4227:
---

Assignee: Daisuke Kobayashi

 Document dfs.namenode.resource.*  
 --

 Key: HDFS-4227
 URL: https://issues.apache.org/jira/browse/HDFS-4227
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Daisuke Kobayashi
  Labels: newbie

 Let's document {{dfs.namenode.resource.*}} in hdfs-default.xml and a section 
 the in the HDFS docs that covers local directories.
 {{dfs.namenode.resource.check.interval}} - the interval in ms at which the 
 NameNode resource checker runs (default is 5000)
 {{dfs.namenode.resource.du.reserved}} - the amount of space to 
 reserve/require for a NN storage directory (default is 100mb)
 {{dfs.namenode.resource.checked.volumes}} - a list of local directories for 
 the NN resource checker to check in addition to the local edits directories 
 (default is empty).
 {{dfs.namenode.resource.checked.volumes.minimum}} - the minimum number of 
 redundant NN storage volumes required (default is 1). If no redundant 
 resources are available we don't enter SM if there are sufficient required 
 resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4227) Document dfs.namenode.resource.*

2012-12-14 Thread Daisuke Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daisuke Kobayashi updated HDFS-4227:


Attachment: HDFS-4227.patch

new patch attached. Can you review?

 Document dfs.namenode.resource.*  
 --

 Key: HDFS-4227
 URL: https://issues.apache.org/jira/browse/HDFS-4227
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Daisuke Kobayashi
  Labels: newbie
 Attachments: HDFS-4227.patch


 Let's document {{dfs.namenode.resource.*}} in hdfs-default.xml and a section 
 the in the HDFS docs that covers local directories.
 {{dfs.namenode.resource.check.interval}} - the interval in ms at which the 
 NameNode resource checker runs (default is 5000)
 {{dfs.namenode.resource.du.reserved}} - the amount of space to 
 reserve/require for a NN storage directory (default is 100mb)
 {{dfs.namenode.resource.checked.volumes}} - a list of local directories for 
 the NN resource checker to check in addition to the local edits directories 
 (default is empty).
 {{dfs.namenode.resource.checked.volumes.minimum}} - the minimum number of 
 redundant NN storage volumes required (default is 1). If no redundant 
 resources are available we don't enter SM if there are sufficient required 
 resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira