date:20140219


[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905092#comment-13905092
 ] 

Hudson commented on HDFS-5780:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/486/])
HDFS-5780. TestRBWBlockInvalidation times out intemittently.  Contributed by 
Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java


 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built


[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905090#comment-13905090
 ] 

Hudson commented on HDFS-5953:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/486/])
Update change description for HDFS-5953 (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestBlockReaderFactory fails if libhadoop.so has not been built
 ---

 Key: HDFS-5953
 URL: https://issues.apache.org/jira/browse/HDFS-5953
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Assignee: Akira AJISAKA
 Fix For: 2.4.0

 Attachments: HDFS-5953.patch


 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
  :
 {code}
 java.lang.RuntimeException: Although a UNIX domain socket path is configured 
 as 
 /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
   at 
 org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
 {code}
 This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates


[ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905096#comment-13905096
 ] 

Hudson commented on HDFS-5893:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/486/])
HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default 
URLConnectionFactory which does not import SSL certificates. Contributed by 
Haohui Mai. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java


 HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
 which does not import SSL certificates
 

 Key: HDFS-5893
 URL: https://issues.apache.org/jira/browse/HDFS-5893
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
 Fix For: 2.4.0

 Attachments: HDFS-5893.000.patch


 When {{HftpFileSystem}} tries to get the data, it create a 
 {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
 However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
 URLConnectionFactory. It does not import the SSL certificates from 
 ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
 To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
 same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails


[ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905093#comment-13905093
 ] 

Hudson commented on HDFS-5803:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/486/])
HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


 TestBalancer.testBalancer0 fails
 

 Key: HDFS-5803
 URL: https://issues.apache.org/jira/browse/HDFS-5803
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5803.patch


 The test testBalancer0 fails on branch 2. Below is the stack trace
 {noformat}
 java.util.concurrent.TimeoutException: Cluster failed to reached expected 
 values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
 280, expected: 300), in more than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint

2014-02-19 Thread zhaoyunjiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905361#comment-13905361
 ] 

zhaoyunjiong commented on HDFS-5944:


Multiple trailing / is impossible.

 LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right 
 cause SecondaryNameNode failed do checkpoint
 -

 Key: HDFS-5944
 URL: https://issues.apache.org/jira/browse/HDFS-5944
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
 HDFS-5944.test.txt


 In our cluster, we encountered error like this:
 java.io.IOException: saveLeases found path 
 /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
 What happened:
 Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
 And Client A continue refresh it's lease.
 Client B deleted /XXX/20140206/04_30/
 Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
 Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
 Then secondaryNameNode try to do checkpoint and failed due to failed to 
 delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
 The reason is a bug in findLeaseWithPrefixPath:
  int srclen = prefix.length();
  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
 entries.put(entry.getKey(), entry.getValue());
   }
 Here when prefix is /XXX/20140206/04_30/, and p is 
 /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
 The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5962) Mtime is not persisted for symbolic links


[ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905407#comment-13905407
 ] 

Hadoop QA commented on HDFS-5962:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629731/HDFS-5962.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6172//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6172//console

This message is automatically generated.

 Mtime is not persisted for symbolic links
 -

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch


 In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 
 when saving to fsimage, even though it is recorded in memory and shown in the 
 listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates


[ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905443#comment-13905443
 ] 

Hudson commented on HDFS-5893:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/])
HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default 
URLConnectionFactory which does not import SSL certificates. Contributed by 
Haohui Mai. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java


 HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
 which does not import SSL certificates
 

 Key: HDFS-5893
 URL: https://issues.apache.org/jira/browse/HDFS-5893
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
 Fix For: 2.4.0

 Attachments: HDFS-5893.000.patch


 When {{HftpFileSystem}} tries to get the data, it create a 
 {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
 However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
 URLConnectionFactory. It does not import the SSL certificates from 
 ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
 To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
 same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails


[ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905440#comment-13905440
 ] 

Hudson commented on HDFS-5803:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/])
HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


 TestBalancer.testBalancer0 fails
 

 Key: HDFS-5803
 URL: https://issues.apache.org/jira/browse/HDFS-5803
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5803.patch


 The test testBalancer0 fails on branch 2. Below is the stack trace
 {noformat}
 java.util.concurrent.TimeoutException: Cluster failed to reached expected 
 values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
 280, expected: 300), in more than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2


[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905439#comment-13905439
 ] 

Hudson commented on HDFS-5780:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/])
HDFS-5780. TestRBWBlockInvalidation times out intemittently.  Contributed by 
Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java


 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built


[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905437#comment-13905437
 ] 

Hudson commented on HDFS-5953:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/])
Update change description for HDFS-5953 (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestBlockReaderFactory fails if libhadoop.so has not been built
 ---

 Key: HDFS-5953
 URL: https://issues.apache.org/jira/browse/HDFS-5953
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Assignee: Akira AJISAKA
 Fix For: 2.4.0

 Attachments: HDFS-5953.patch


 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
  :
 {code}
 java.lang.RuntimeException: Although a UNIX domain socket path is configured 
 as 
 /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
   at 
 org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
 {code}
 This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup


[ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905501#comment-13905501
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5966:
--

Patch looks good.  A minor suggestion: add saveMD5File(File dataFile, String 
digestString) to MD5FileUtils then both renameMD5File and the original 
saveMD5File can use it.

 Fix rollback of rolling upgrade in NameNode HA setup
 

 Key: HDFS-5966
 URL: https://issues.apache.org/jira/browse/HDFS-5966
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5966.000.patch


 This jira does the following:
 1. When do rollback for rolling upgrade, we should call 
 FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
 in HA setup).
 2. After the rollback, we also need to rename the md5 file and change its 
 reference file name.
 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates


[ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905517#comment-13905517
 ] 

Hudson commented on HDFS-5893:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/])
HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default 
URLConnectionFactory which does not import SSL certificates. Contributed by 
Haohui Mai. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java


 HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
 which does not import SSL certificates
 

 Key: HDFS-5893
 URL: https://issues.apache.org/jira/browse/HDFS-5893
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
 Fix For: 2.4.0

 Attachments: HDFS-5893.000.patch


 When {{HftpFileSystem}} tries to get the data, it create a 
 {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
 However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
 URLConnectionFactory. It does not import the SSL certificates from 
 ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
 To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
 same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built


[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905511#comment-13905511
 ] 

Hudson commented on HDFS-5953:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/])
Update change description for HDFS-5953 (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestBlockReaderFactory fails if libhadoop.so has not been built
 ---

 Key: HDFS-5953
 URL: https://issues.apache.org/jira/browse/HDFS-5953
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Assignee: Akira AJISAKA
 Fix For: 2.4.0

 Attachments: HDFS-5953.patch


 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
  :
 {code}
 java.lang.RuntimeException: Although a UNIX domain socket path is configured 
 as 
 /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
   at 
 org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
 {code}
 This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails


[ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905514#comment-13905514
 ] 

Hudson commented on HDFS-5803:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/])
HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


 TestBalancer.testBalancer0 fails
 

 Key: HDFS-5803
 URL: https://issues.apache.org/jira/browse/HDFS-5803
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5803.patch


 The test testBalancer0 fails on branch 2. Below is the stack trace
 {noformat}
 java.util.concurrent.TimeoutException: Cluster failed to reached expected 
 values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
 280, expected: 300), in more than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2


[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905513#comment-13905513
 ] 

Hudson commented on HDFS-5780:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/])
HDFS-5780. TestRBWBlockInvalidation times out intemittently.  Contributed by 
Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java


 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links


 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5962:
-

Summary: Mtime and atime are not persisted for symbolic links  (was: Mtime 
is not persisted for symbolic links)

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch


 In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 
 when saving to fsimage, even though it is recorded in memory and shown in the 
 listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links


 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5962:
-

Description: In {{FSImageSerialization}}, the mtime and atime of symbolic 
links are hardcoded to be 0 when saving to fsimage, even though they are 
recorded in memory and shown in the listing until restarting namenode.  (was: 
In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 when 
saving to fsimage, even though it is recorded in memory and shown in the 
listing until restarting namenode.)

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links


[ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905547#comment-13905547
 ] 

Kihwal Lee commented on HDFS-5962:
--

It should be easy to add a test case for this. Start a mini cluster, create a 
symlink, do saveNamespace and restart the namenode. Compare the time stamp 
before and after.  Directory (mtime) and file inode (mtime and atime) can be 
covered in the same test.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link


 [ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5961:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
 Assignee: Kihwal Lee
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Jing. I've committed this to trunk, branch-2 and 
branch-2.4.

 OIV cannot load fsimages containing a symbolic link
 ---

 Key: HDFS-5961
 URL: https://issues.apache.org/jira/browse/HDFS-5961
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5961.patch


 In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
 symlink INodes. So after incorrectly reading in the first symbolic link , the 
 next INode can't be read.
 HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link


[ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905564#comment-13905564
 ] 

Hudson commented on HDFS-5961:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5188 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5188/])
HDFS-5961. OIV cannot load fsimages containing a symbolic link. Contributed by 
Kihwal Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569789)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java


 OIV cannot load fsimages containing a symbolic link
 ---

 Key: HDFS-5961
 URL: https://issues.apache.org/jira/browse/HDFS-5961
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5961.patch


 In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
 symlink INodes. So after incorrectly reading in the first symbolic link , the 
 next INode can't be read.
 HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-19 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905606#comment-13905606
 ] 

Yongjun Zhang commented on HDFS-5939:
-

Thanks Haohui.

Indeed, the contract of Random.nextInt() expects numOfDatanodes to be greater 
than 0, otherwise, it will throw
   IllegalArgumentException(n must be positive);
That's what I listed in the original bug report, and we haven't seen this 
exception throw from 
  NetworkTopology.chooseRandom(String scope, String excludedScope)
until HDFS-5939.

Investigation of this bug shows that numOfDatanodes is 0 because no dataNode is 
running in this case.

Prior to my fix, there are three cases of how method 
  NetworkTopology.chooseRandom(String scope, String excludedScope)
could finish:
1. return valid Node
2. return null (in the beginning of the method)
3. throw the above exception when calling Random.nextInt() ( in the end of the 
method).

It seems all callers of this method didn't check for case 2. The result would 
be, if it happens, the caller would result in null pointer exception (again, 
there is no report saying this ever happened).

HDFS-5939 is case 3 where the caller is NamenodeWebHdfs.redirectURI(..).  My 
submitted fix makes chooseRandom method to return null before calling 
Random.netxInt() when numDatanode is 0, and throw NoDatanodeException from 
caller side. Basically my fix replace the InvalidArgumentException with 
NoDatanodeException for this case with an explicit message to help user,   

With my submitted fix here, if numOfDatanode==0 happens for other callers of 
chooseRandom method in real case, my fix won't really hide the problem. That 
is, it will result in null pointer exception, instead of the 
InvalidArgumentException.  Now this is covered by HDFS-5970. I hope there is a 
field report of HDFS-5970 before we fix HDFS-5970 so we can understand why it 
happened.

Another alternative to my fix is, to change the interface of 
NetworkTopology.chooseRandom exception spec, and to let it throw 
NodatanodeException instead of InvalidArgumentException. I didn't do this in my 
submitted fix for two reasons:
- the caller has better chance to provide a more helpful message.
- the impact of changing the interface in wider.

Would you please let me know what you think? thanks.













 WebHdfs returns misleading error code and logs nothing if trying to create a 
 file with no DNs in cluster
 

 Key: HDFS-5939
 URL: https://issues.apache.org/jira/browse/HDFS-5939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-5939.001.patch


 When trying to access hdfs via webhdfs, and when datanode is dead, user will 
 see an exception below without any clue that it's caused by dead datanode:
 $ curl -i -X PUT 
 .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false
 ...
 {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n
  must be positive}}
 Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value

2014-02-19 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905627#comment-13905627
 ] 

Yongjun Zhang commented on HDFS-5970:
-

Thanks Junping.


 callers of NetworkTopology's chooseRandom method to expect null return value
 

 Key: HDFS-5970
 URL: https://issues.apache.org/jira/browse/HDFS-5970
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Yongjun Zhang

 Class NetworkTopology's method
public Node chooseRandom(String scope) 
 calls 
private Node chooseRandom(String scope, String excludedScope)
 which may return null value.
 Callers of this method such as BlockPlacementPolicyDefault etc need to be 
 aware that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab

2014-02-19 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5898:
--

Attachment: HDFS-5898-with-documentation.patch

Not sure why the previous build broke.
[~atm], were you able to take a look at this patch?

 Allow NFS gateway to login/relogin from its kerberos keytab
 ---

 Key: HDFS-5898
 URL: https://issues.apache.org/jira/browse/HDFS-5898
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0, 2.4.0
Reporter: Jing Zhao
Assignee: Abin Shahab
 Attachments: HDFS-5898-documentation.patch, 
 HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, 
 HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, 
 HDFS-5898.patch


 According to the discussion in HDFS-5804:
 1. The NFS gateway should be able to get it's own tgts, and renew them.
 2. We should update the HdfsNfsGateway.apt.vm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2014-02-19 Thread Masatake Iwasaki (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Masatake Iwasaki updated HDFS-5274:
---

Attachment: HDFS-5274-7.patch

I am attaching the patch rebased and updated based on review comments.

bq. Any reason we take config on construction and in init for SpanReceiverHost?

I removed conf from constructor argument.

bq. SpanReceiverHost is on only when trace is enabled, right? If so, say so in
class comment.

SpanReceiverHost is always on, though it do nothing if no SpanReceiver is
configured. I added a line in class comment.

bq. Has to be a shutdown hook? ShutdownHookManager.get().addShutdownHook ? This
is fine unless we envision someone having to override it which I suppose should
never happen for an optionally enabled, rare, trace function?

Overriding SpanReceiverHost is not necessary, though there could be someone who
implement SpanReceiver. I think it is useful to wait for receivers to process
all the tracing data on crash scenario.

bq. HTraceConfiguration is for testing only? Should be @visiblefortesting only
or a comment at least?

HTraceConfiguration is used by SpanReceiver implementation, not for testing
only.

bq. Should there be defines for a few of these? DFSInputStream.close seems
fine... only used once DFSInputStream.read?

I think it is fine not to define DFSInputStream.read now.

There are some fixes in addition to above such as,

* removed timing dependency from TestTracing.
* added guard by Trace.isTracing() around startSpan() in DFSInputStream,
FsShell and WritableRpcEngine.
* removed SpanReceiverHost from FsShell and DFSClient. I will add options or
config properties to turn on tracing from shell later on another JIRA issue.

Add Tracing to HDFS
---

Key: HDFS-5274
URL: https://issues.apache.org/jira/browse/HDFS-5274
Project: Hadoop HDFS
Issue Type: New Feature
Components: datanode, namenode
Affects Versions: 2.1.1-beta
Reporter: Elliott Clark
Assignee: Elliott Clark
Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch,
HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch,
HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace
d0f0d66b8a258a69.png

Since Google's Dapper paper has shown the benefits of tracing for a large
distributed system, it seems like a good time to add tracing to HDFS. HBase
has added tracing using HTrace. I propose that the same can be done within
HDFS.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905789#comment-13905789
 ] 

Arpit Agarwal commented on HDFS-5318:
-

[~sirianni] are these failures related to the patch?

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
 HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
 HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

[
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905797#comment-13905797
]

Brandon Li commented on HDFS-5583:
--

Some early comments. I haven't finish viewing all the changes.
- In DataNode#shutdownDatanode() can be called only once, and throws exception
for the next invocations.
I would imagine that after administrator issues dfsadmin shutdownDatanode
-upgradecommand, he/she would like to know if the DataNodes received it and if
they are in upgrade preparation state. Unless I missed something, it seems the
only way to know it is to issue the same command again and expect to receive an
exception. Would it be better to either let shutdownDatanode return an error
code or have getDataNodeInfo include current datanode state?

- Do we plan to have more OOB Ack anytime soon? We can always add new enums
instead of reserving a few OOB_RESERVEDx for now.

- In DataNode.java: is forUpgrade, upgrade or shutdownForUpgrade a better
name than the variable name restarting? :-)

- DataXceiverServer.java: please clean the unused import

Make DN send an OOB Ack on shutdown before restaring

Key: HDFS-5583
URL: https://issues.apache.org/jira/browse/HDFS-5583
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch

Add an ability for data nodes to send an OOB response in order to indicate an
upcoming upgrade-restart. Client should ignore the pipeline error from the
node for a configured amount of time and try reconstruct the pipeline without
excluding the restarted node. If the node does not come back in time,
regular pipeline recovery should happen.
This feature is useful for the applications with a need to keep blocks local.
If the upgrade-restart is fast, the wait is preferable to losing locality.
It could also be used in general instead of the draining-writer strategy.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905821#comment-13905821
 ] 

Chris Nauroth commented on HDFS-4685:
-

I have merged the HDFS-4685 branch to trunk, as per the passing merge vote here:

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201402.mbox/%3CCABCYYb-3jGNDhhXg%2B-TuFw0f-_2YybAJdiRgUpbkRXEvNvTDYA%40mail.gmail.com%3E


 Implementation of ACLs in HDFS
 --

 Key: HDFS-4685
 URL: https://issues.apache.org/jira/browse/HDFS-4685
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
 Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, 
 HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, 
 HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf


 Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
 achieved using getfacl and setfacl utilities. Is there anybody working on 
 this feature ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-4685) Implementation of ACLs in HDFS


 [ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-4685:


Fix Version/s: 3.0.0

 Implementation of ACLs in HDFS
 --

 Key: HDFS-4685
 URL: https://issues.apache.org/jira/browse/HDFS-4685
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, 
 HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, 
 HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf


 Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
 achieved using getfacl and setfacl utilities. Is there anybody working on 
 this feature ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS

2014-02-19 Thread Claudio Fahey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905830#comment-13905830
 ] 

Claudio Fahey commented on HDFS-4685:
-

I am currently traveling and will be back on Thursday 2/20. Email responses may 
be delayed.


 Implementation of ACLs in HDFS
 --

 Key: HDFS-4685
 URL: https://issues.apache.org/jira/browse/HDFS-4685
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, 
 HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, 
 HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf


 Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
 achieved using getfacl and setfacl utilities. Is there anybody working on 
 this feature ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links


[ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905841#comment-13905841
 ] 

Haohui Mai commented on HDFS-5962:
--

The old fsimage does not persist mtime and atime. The PB-based fsimage follows 
the old behavior. I wonder whether this is a bug in the old code, or it's done 
intentionally.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-19 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905856#comment-13905856
 ] 

Eric Sirianni commented on HDFS-5318:
-

For {{TestCacheDirectives.testCacheManagerRestart}}, the failure is in a 
comparison of BlockPool IDs:
{noformat}
Inconsistent checkpoint fields.
LV = -52 namespaceID = 173186898 cTime = 0 ; clusterId = testClusterID ; 
blockpoolId = BP-447030995-67.195.138.22-1392762420027.
Expecting respectively: -52; 2; 0; testClusterID; 
BP-2140913546-67.195.138.22-1392762411177.
{noformat}
I don't see how that could be related to my change.  That test also passes in 
my environment.

For {{TestBalancerWithNodeGroup}}, the failure may be related to my change to 
{{MiniDFSCluster}} method signatures to allow for overlaying {{Configurations}} 
on individual {{DataNode}} objects.  I'm currently investigating and will 
update soon.


 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
 HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
 HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905847#comment-13905847
 ] 

Hudson commented on HDFS-4685:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5191 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5191/])
Merge HDFS-4685 to trunk. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569870)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntry.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryScope.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryType.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclStatus.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsAction.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/AclCommands.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/FsCommand.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Ls.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestAcl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestFsPermission.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestAclCommands.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestChRootedFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemDelegation.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AclException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclConfigFlag.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclFeature.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclStorage.java
*

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905867#comment-13905867
 ] 

Andrew Wang commented on HDFS-5318:
---

Both of these are known flakies, so I'd be inclined just to go ahead and commit.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
 HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
 HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-19 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5966:


Attachment: HDFS-5966.001.patch

Thanks for the review, Nicholas! Update the patch to address the comments.

 Fix rollback of rolling upgrade in NameNode HA setup
 

 Key: HDFS-5966
 URL: https://issues.apache.org/jira/browse/HDFS-5966
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch


 This jira does the following:
 1. When do rollback for rolling upgrade, we should call 
 FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
 in HA setup).
 2. After the rollback, we also need to rename the md5 file and change its 
 reference file name.
 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab


[ 
https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905889#comment-13905889
 ] 

Hadoop QA commented on HDFS-5898:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12629805/HDFS-5898-with-documentation.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6173//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6173//console

This message is automatically generated.

 Allow NFS gateway to login/relogin from its kerberos keytab
 ---

 Key: HDFS-5898
 URL: https://issues.apache.org/jira/browse/HDFS-5898
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0, 2.4.0
Reporter: Jing Zhao
Assignee: Abin Shahab
 Attachments: HDFS-5898-documentation.patch, 
 HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, 
 HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, 
 HDFS-5898.patch


 According to the discussion in HDFS-5804:
 1. The NFS gateway should be able to get it's own tgts, and renew them.
 2. We should update the HdfsNfsGateway.apt.vm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-19 Thread Eric Sirianni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5318:


Attachment: HDFS-5318-trunk-c.patch

Updated patch with fix for {{TestBalancerWithNodeGroup}}.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, 
 HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-19 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905906#comment-13905906
 ] 

Eric Sirianni commented on HDFS-5318:
-

Thanks [~andrew.wang].  I suspected the same thing after reading some JIRAs 
about {{TestBalancerWithNodeGroup}}.  However, it turns out I did actually 
introduce a bug there :).  The updated patch should fix it.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, 
 HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster


[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905909#comment-13905909
 ] 

Haohui Mai commented on HDFS-5939:
--

Two questions. {{In NetworkTopology}}:

# Under what circumstances, {{getNode(excludedScope)}} will exclude all 
datanodes?
# Is it safe to assert that {{numOfDatanodes}} always is greater or equal than 
0?

[~szetszwo], can you comment on this?


 WebHdfs returns misleading error code and logs nothing if trying to create a 
 file with no DNs in cluster
 

 Key: HDFS-5939
 URL: https://issues.apache.org/jira/browse/HDFS-5939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-5939.001.patch


 When trying to access hdfs via webhdfs, and when datanode is dead, user will 
 see an exception below without any clue that it's caused by dead datanode:
 $ curl -i -X PUT 
 .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false
 ...
 {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n
  must be positive}}
 Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905912#comment-13905912
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Thanks for the heads up Andrew.

+1 pending Jenkins again. Verified {{TestBalancerWithNodeGroup}} passes with 
the latest patch.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, 
 HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable


[ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905924#comment-13905924
 ] 

Arpit Agarwal commented on HDFS-5868:
-

Nitpcik: {{BlockReceiver#cout}} can be removed.

+1 otherwise.

 Make hsync implementation pluggable
 ---

 Key: HDFS-5868
 URL: https://issues.apache.org/jira/browse/HDFS-5868
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.2.0
Reporter: Buddy
 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch


 The current implementation of hsync in BlockReceiver only works if the output 
 streams are instances of FileOutputStream. Therefore, there is currently no 
 way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
 OS files.
 One possible solution is to push the implementation of hsync into the 
 ReplicaOutputStreams class. This class is constructed by the 
 ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
 it can be extended. Instead of directly calling sync on the output stream, 
 BlockReceiver would call ReplicaOutputStream.sync.  The default 
 implementation of sync in ReplicaOutputStream would be the same as the 
 current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links


 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5962:


Attachment: HDFS-5962.3.patch

Thanks [~kihwal], added a test-case for loading atime and mtime.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links


[ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905937#comment-13905937
 ] 

Hadoop QA commented on HDFS-5962:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629868/HDFS-5962.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6176//console

This message is automatically generated.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links


[ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905945#comment-13905945
 ] 

Akira AJISAKA commented on HDFS-5962:
-

[~wheat9], I suppose it's a bug because the output of {{ls}} shows wrong 
information after restarting NameNode.
{code}
$ hdfs dfs -ls
-rwxrwxrwx - user supergroup 0 1970-01-01 00:00 symlink
{code}

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information


 [ 
https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5950:
---

Attachment: HDFS-5950.001.patch

 The DFSClient and DataNode should use shared memory segments to communicate 
 short-circuit information
 -

 Key: HDFS-5950
 URL: https://issues.apache.org/jira/browse/HDFS-5950
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5950.001.patch


 The DFSClient and DataNode should use the shared memory segments and unified 
 cache added in the other HDFS-5182 subtasks to communicate short-circuit 
 information.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup


 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5966:
-

Hadoop Flags: Reviewed

+1 patch looks good.  Will commit it shortly.

 Fix rollback of rolling upgrade in NameNode HA setup
 

 Key: HDFS-5966
 URL: https://issues.apache.org/jira/browse/HDFS-5966
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch


 This jira does the following:
 1. When do rollback for rolling upgrade, we should call 
 FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
 in HA setup).
 2. After the rollback, we also need to rename the md5 file and change its 
 reference file name.
 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup


 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5966.
--

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)

I have committed this.  Thanks, Jing!

 Fix rollback of rolling upgrade in NameNode HA setup
 

 Key: HDFS-5966
 URL: https://issues.apache.org/jira/browse/HDFS-5966
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: HDFS-5535 (Rolling upgrades)

 Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch


 This jira does the following:
 1. When do rollback for rolling upgrade, we should call 
 FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
 in HA setup).
 2. After the rollback, we also need to rename the md5 file and change its 
 reference file name.
 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint


[ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905958#comment-13905958
 ] 

Brandon Li commented on HDFS-5944:
--

+1. Both patches look good to me. 

 LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right 
 cause SecondaryNameNode failed do checkpoint
 -

 Key: HDFS-5944
 URL: https://issues.apache.org/jira/browse/HDFS-5944
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
 HDFS-5944.test.txt


 In our cluster, we encountered error like this:
 java.io.IOException: saveLeases found path 
 /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
 What happened:
 Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
 And Client A continue refresh it's lease.
 Client B deleted /XXX/20140206/04_30/
 Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
 Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
 Then secondaryNameNode try to do checkpoint and failed due to failed to 
 delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
 The reason is a bug in findLeaseWithPrefixPath:
  int srclen = prefix.length();
  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
 entries.put(entry.getKey(), entry.getValue());
   }
 Here when prefix is /XXX/20140206/04_30/, and p is 
 /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
 The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-02-19 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HDFS-5776:

Attachment: HDFS-5776v21.txt

This patch has a few small differences that come of some time spent testing:

1. Adds DEBUG level logging of the one-time setup of the hedged reads pool.
2. Gives the hedged read pool threads a 'hedged' prefix.
3. Changes the 'cancel' behavior so it does NOT cancel ongoing reads.

3. is the biggest change. What I've found is that hdfs reads do not take
kindly to being interrupted. The exception types that bubble up are of a few
versions -- InterruptedIOException, ClosedByInterruptException, and IOEs whose
cause is a IE -- but I also encountered complaints coming up out of protobuf
decoding messages likely because the read was cancelled partway through. Then
there was a bunch of logging noise -- WARN-level logging -- because of the
interrupt exceptions and the fact that on interrupt, the node we were reading
against would get added to the dead list.

I had a patch that was more involved dealing w/ the interrupt exceptions and
redoing the WARNs but it was getting very involved and I was coming to rely on
an untrod path, that of interrupted reads so I let it go for now for now.

This patch lets outstanding reads finish.

Let me chat w/ [~xieliang007] to possibly get production numbers on benefit of
patch as is.

Support 'hedged' reads in DFSClient
---

Key: HDFS-5776
URL: https://issues.apache.org/jira/browse/HDFS-5776
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt,
HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt,
HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt,
HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt,
HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt,
HDFS-5776v18.txt, HDFS-5776v21.txt

This is a placeholder of hdfs related stuff backport from
https://issues.apache.org/jira/browse/HBASE-7509
The quorum read ability should be helpful especially to optimize read outliers
we can utilize dfs.dfsclient.quorum.read.threshold.millis
dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read
ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we
could export the interested metric valus into client system(e.g. HBase's
regionserver metric).
The core logic is in pread code path, we decide to goto the original
fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per
the above config items.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5973) add DomainSocket#shutdown method

Colin Patrick McCabe created HDFS-5973:
--

 Summary: add DomainSocket#shutdown method
 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method


 [ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5973:
---

Attachment: HDFS-5973.001.patch

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method


 [ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5973:
---

Status: Patch Available  (was: Open)

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method


[ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905965#comment-13905965
 ] 

Colin Patrick McCabe commented on HDFS-5973:


This is a pretty simple one.  Just exposing the existing code which calls 
{{shutdown(2)}} on a socket.

This is to allow me to shutdown the UNIX domain socket associated with a shared 
memory segment for error handling purposes.  {{close}} could be used for this 
purpose, but it's a little more heavyweight than what I need here, since 
{{DomainSocket#close}} blocks until the fd is actually, well, closed.  Since 
the UNIX domain socket associated with a shared memory segment is inside a 
{{DomainSocketWatcher}}, it can't be actually closed until the 
{{DomainSocketWatcher}} lets go of it.

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method


[ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905971#comment-13905971
 ] 

Andrew Wang commented on HDFS-5973:
---

+1 pending Jenkins bot

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links


 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5962:


Attachment: HDFS-5962.4.patch

Rebased the patch for the latest trunk.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, 
 HDFS-5962.4.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint


 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

Status: Patch Available  (was: Open)

 LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right 
 cause SecondaryNameNode failed do checkpoint
 -

 Key: HDFS-5944
 URL: https://issues.apache.org/jira/browse/HDFS-5944
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0, 1.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
 HDFS-5944.test.txt, HDFS-5944.trunk.patch


 In our cluster, we encountered error like this:
 java.io.IOException: saveLeases found path 
 /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
 What happened:
 Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
 And Client A continue refresh it's lease.
 Client B deleted /XXX/20140206/04_30/
 Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
 Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
 Then secondaryNameNode try to do checkpoint and failed due to failed to 
 delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
 The reason is a bug in findLeaseWithPrefixPath:
  int srclen = prefix.length();
  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
 entries.put(entry.getKey(), entry.getValue());
   }
 Here when prefix is /XXX/20140206/04_30/, and p is 
 /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
 The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint


 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

Attachment: HDFS-5944.trunk.patch

Upload the same trunk patch to trigger the build.

 LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right 
 cause SecondaryNameNode failed do checkpoint
 -

 Key: HDFS-5944
 URL: https://issues.apache.org/jira/browse/HDFS-5944
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
 HDFS-5944.test.txt, HDFS-5944.trunk.patch


 In our cluster, we encountered error like this:
 java.io.IOException: saveLeases found path 
 /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
 What happened:
 Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
 And Client A continue refresh it's lease.
 Client B deleted /XXX/20140206/04_30/
 Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
 Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
 Then secondaryNameNode try to do checkpoint and failed due to failed to 
 delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
 The reason is a bug in findLeaseWithPrefixPath:
  int srclen = prefix.length();
  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
 entries.put(entry.getKey(), entry.getValue());
   }
 Here when prefix is /XXX/20140206/04_30/, and p is 
 /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
 The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail


[ 
https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905982#comment-13905982
 ] 

Arpit Agarwal commented on HDFS-5963:
-

Thanks Nicholas. JDK7 could randomize the test case order so perhaps we need to 
put testSecondaryNameNode in a separate test class?

Is this failure expected?
{code}
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.565 sec  
FAILURE! - in org.apache.hadoop.hdfs.TestRollingUpgrade
testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade)  Time elapsed: 3.386 
sec   ERROR!
java.io.IOException: There appears to be a gap in the edit log.  We expected 
txid 5, but got txid 8.
at 
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:203)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:131)
{code}

 TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
 

 Key: HDFS-5963
 URL: https://issues.apache.org/jira/browse/HDFS-5963
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Arpit Agarwal
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5963_20140218.patch


 {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
 It seems to be caused by the terminate hook used by the test. Commenting out 
 this test case makes other tests in the same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab

2014-02-19 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905991#comment-13905991
 ] 

Jing Zhao commented on HDFS-5898:
-

bq. I don't follow how the change in RpcProgramNfs3 is related to this issue.
Yes, I think the change in RpcProgramNfs3 is a regression of HDFS-5913.

One question: the current patch puts the login into 
DFSClientCache#getUserGroupInformation, which is called by the load() method of 
the loading cache. Thus we will call login() every time we miss the cache. 
Should we put the login call into the constructor of RpcProgramNfs3 instead?


 Allow NFS gateway to login/relogin from its kerberos keytab
 ---

 Key: HDFS-5898
 URL: https://issues.apache.org/jira/browse/HDFS-5898
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0, 2.4.0
Reporter: Jing Zhao
Assignee: Abin Shahab
 Attachments: HDFS-5898-documentation.patch, 
 HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, 
 HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, 
 HDFS-5898.patch


 According to the discussion in HDFS-5804:
 1. The NFS gateway should be able to get it's own tgts, and renew them.
 2. We should update the HdfsNfsGateway.apt.vm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-02-19 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-5776:


Release Note: 
If a read from a block is slow, start up another parallel, 'hedged' read 
against a different block replica.  We then take the result of which ever read 
returns first (the outstanding read is cancelled).  This 'hedged' read feature 
will help rein in the outliers, the odd read that takes a long time because it 
hit a bad patch on the disc, etc.

This feature is off by default.  To enable this feature, set 
codedfs.client.hedged.read.threadpool.size/code to a positive number.  The 
threadpool size is how many threads to dedicate to the running of these 
'hedged', concurrent reads in your client.

Then set codedfs.client.hedged.read.threshold.millis/code to the number of 
milliseconds to wait before starting up a 'hedged' read.  For example, if you 
set this property to 10, then if a read has not returned within 10 
milliseconds, we will start up a new read against a different block replica.

This feature emits new metrics:

+ hedgedReadOps
+ hedgeReadOpsWin -- how many times the hedged read 'beat' the original read
+ hedgedReadOpsInCurThread -- how many times we went to do a hedged read but we 
had to run it in the current thread because 
dfs.client.hedged.read.threadpool.size was at a maximum.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, 
 HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, 
 HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, 
 HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, 
 HDFS-5776v18.txt, HDFS-5776v21.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable

2014-02-19 Thread Buddy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5868:


Attachment: HDFS-5868b-branch-2.patch

Updated based on Arpit's comment and regenerated against latest trunk. Thanks 
Arpit!


 Make hsync implementation pluggable
 ---

 Key: HDFS-5868
 URL: https://issues.apache.org/jira/browse/HDFS-5868
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.2.0
Reporter: Buddy
 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, 
 HDFS-5868b-branch-2.patch


 The current implementation of hsync in BlockReceiver only works if the output 
 streams are instances of FileOutputStream. Therefore, there is currently no 
 way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
 OS files.
 One possible solution is to push the implementation of hsync into the 
 ReplicaOutputStreams class. This class is constructed by the 
 ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
 it can be extended. Instead of directly calling sync on the output stream, 
 BlockReceiver would call ReplicaOutputStream.sync.  The default 
 implementation of sync in ReplicaOutputStream would be the same as the 
 current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5483:


Attachment: h5483.03.patch

Rebase patch and get updated Jenkins +1.

 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch, h5483.03.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start


 [ 
https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5742:


Description: 
DatanodeCluster fails to start with NPE in MiniDFSCluster.

Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
check for null configuration.

Also included are a few improvements to DataNodeCluster, details in comments 
below.

  was:
DatanodeCluster fails to start with NPE in MiniDFSCluster.

Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
check for null configuration.


 DatanodeCluster (mini cluster of DNs) fails to start
 

 Key: HDFS-5742
 URL: https://issues.apache.org/jira/browse/HDFS-5742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor
 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, 
 HDFS-5742.03.patch, HDFS-5742.04.patch


 DatanodeCluster fails to start with NPE in MiniDFSCluster.
 Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
 check for null configuration.
 Also included are a few improvements to DataNodeCluster, details in comments 
 below.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5974) Fix compilation error after merge

Tsz Wo (Nicholas), SZE created HDFS-5974:


 Summary: Fix compilation error after merge
 Key: HDFS-5974
 URL: https://issues.apache.org/jira/browse/HDFS-5974
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34]
 cannot find symbol
symbol  : variable Feature
location: class 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil
[INFO] 1 error
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method


[ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906043#comment-13906043
 ] 

Hadoop QA commented on HDFS-5973:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629873/HDFS-5973.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6177//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6177//console

This message is automatically generated.

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable


[ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906049#comment-13906049
 ] 

Arpit Agarwal commented on HDFS-5868:
-

+1 for the patch pending Jenkins.

 Make hsync implementation pluggable
 ---

 Key: HDFS-5868
 URL: https://issues.apache.org/jira/browse/HDFS-5868
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.2.0
Reporter: Buddy
 Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, 
 HDFS-5868b-branch-2.patch


 The current implementation of hsync in BlockReceiver only works if the output 
 streams are instances of FileOutputStream. Therefore, there is currently no 
 way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
 OS files.
 One possible solution is to push the implementation of hsync into the 
 ReplicaOutputStreams class. This class is constructed by the 
 ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
 it can be extended. Instead of directly calling sync on the output stream, 
 BlockReceiver would call ReplicaOutputStream.sync.  The default 
 implementation of sync in ReplicaOutputStream would be the same as the 
 current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5274) Add Tracing to HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906064#comment-13906064
 ] 

Hadoop QA commented on HDFS-5274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629828/HDFS-5274-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6174//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6174//console

This message is automatically generated.

 Add Tracing to HDFS
 ---

 Key: HDFS-5274
 URL: https://issues.apache.org/jira/browse/HDFS-5274
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 2.1.1-beta
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
 HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
 HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
 d0f0d66b8a258a69.png


 Since Google's Dapper paper has shown the benefits of tracing for a large 
 distributed system, it seems like a good time to add tracing to HDFS.  HBase 
 has added tracing using HTrace.  I propose that the same can be done within 
 HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN


[ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906079#comment-13906079
 ] 

Chris Nauroth commented on HDFS-5483:
-

Hi Arpit,

This patch looks good.  Just one minor comment on 
{{TestBlockHasMultipleReplicasOnSameDN#startUpCluster}}.  There is a 
visible-for-testing {{DistributedFileSystem#getClient}} method that returns the 
underlying {{DFSClient}} instance.  I'm wondering if the test initialization 
code can be reduced to {{client = fs.getClient()}}.


 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch, h5483.03.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5974) Fix compilation error after merge


 [ 
https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5974:
-

Attachment: h5974_20140219.patch

h5974_20140219.patch: fixes compilation error, NameNodeLayoutVersion and 
DataNodeLayoutVersion.

 Fix compilation error after merge
 -

 Key: HDFS-5974
 URL: https://issues.apache.org/jira/browse/HDFS-5974
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5974_20140219.patch


 {noformat}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34]
  cannot find symbol
 symbol  : variable Feature
 location: class 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring


[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906104#comment-13906104
 ] 

Kihwal Lee commented on HDFS-5583:
--

Thanks for the review, Brandon.

- The admin wants to know whether the command was received: This is determined 
by the return code of the command. As with other commands, when the return code 
is not 0, the state is non-deterministic and only then the command may be 
reissued. I do not believe that this is a common case.  Moreover, the shutdown 
normally take less than two seconds and probably the reissuing shutdown 
manually take more than that. In my opinion, adding support for reporting 
progress won't have much value. If you still feel that it needs to be changed, 
I will change it. Please let me know what you think.

- I am planning on adding at least one more OOB ack type in near future for 
write draining, which will be useful for decommissioining. The reserved enums 
make certain checks more efficient.

I will address the rest of the comments when you finish the review.

 Make DN send an OOB Ack on shutdown before restaring
 

 Key: HDFS-5583
 URL: https://issues.apache.org/jira/browse/HDFS-5583
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch


 Add an ability for data nodes to send an OOB response in order to indicate an 
 upcoming upgrade-restart. Client should ignore the pipeline error from the 
 node for a configured amount of time and try reconstruct the pipeline without 
 excluding the restarted node.  If the node does not come back in time, 
 regular pipeline recovery should happen.
 This feature is useful for the applications with a need to keep blocks local. 
 If the upgrade-restart is fast, the wait is preferable to losing locality.  
 It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring


[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906104#comment-13906104
 ] 

Kihwal Lee edited comment on HDFS-5583 at 2/19/14 9:26 PM:
---

Thanks for the review, Brandon.

- The admin wants to know whether the command was received by the datanode: 
This is determined by the return code of the command. As with other commands, 
when the return code is not 0, the state is non-deterministic and only then the 
command may be reissued. I do not believe that this is a common case.  
Moreover, the shutdown normally takes less than two seconds and probably the 
reissuing shutdown manually takes more than that. In my opinion, adding support 
for reporting progress won't have much value. If you still feel that it needs 
to be changed, I will change it. Please let me know what you think.

- I am planning on adding at least one more OOB ack type in near future for 
write draining, which will be useful for decommissioining. The reserved enums 
make certain checks more efficient.

I will address the rest of the comments when you finish the review.


was (Author: kihwal):
Thanks for the review, Brandon.

- The admin wants to know whether the command was received: This is determined 
by the return code of the command. As with other commands, when the return code 
is not 0, the state is non-deterministic and only then the command may be 
reissued. I do not believe that this is a common case.  Moreover, the shutdown 
normally take less than two seconds and probably the reissuing shutdown 
manually take more than that. In my opinion, adding support for reporting 
progress won't have much value. If you still feel that it needs to be changed, 
I will change it. Please let me know what you think.

- I am planning on adding at least one more OOB ack type in near future for 
write draining, which will be useful for decommissioining. The reserved enums 
make certain checks more efficient.

I will address the rest of the comments when you finish the review.

 Make DN send an OOB Ack on shutdown before restaring
 

 Key: HDFS-5583
 URL: https://issues.apache.org/jira/browse/HDFS-5583
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch


 Add an ability for data nodes to send an OOB response in order to indicate an 
 upcoming upgrade-restart. Client should ignore the pipeline error from the 
 node for a configured amount of time and try reconstruct the pipeline without 
 excluding the restarted node.  If the node does not come back in time, 
 regular pipeline recovery should happen.
 This feature is useful for the applications with a need to keep blocks local. 
 If the upgrade-restart is fast, the wait is preferable to losing locality.  
 It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5974) Fix compilation error after merge


 [ 
https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5974:


Hadoop Flags: Reviewed

+1 for the patch.  Thanks, Nicholas.

 Fix compilation error after merge
 -

 Key: HDFS-5974
 URL: https://issues.apache.org/jira/browse/HDFS-5974
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5974_20140219.patch


 {noformat}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34]
  cannot find symbol
 symbol  : variable Feature
 location: class 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster


[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906120#comment-13906120
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5939:
--

In chooseRandom(..), excludedScope must be null or a proper descendent of scope 
after the first if-statement.  So (1) it never excludes all nodes and (2) we 
must have numOfDatanodes = 1.

 WebHdfs returns misleading error code and logs nothing if trying to create a 
 file with no DNs in cluster
 

 Key: HDFS-5939
 URL: https://issues.apache.org/jira/browse/HDFS-5939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-5939.001.patch


 When trying to access hdfs via webhdfs, and when datanode is dead, user will 
 see an exception below without any clue that it's caused by dead datanode:
 $ curl -i -X PUT 
 .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false
 ...
 {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n
  must be positive}}
 Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5974) Fix compilation error after merge


 [ 
https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5974.
--

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)

Thanks Chris for reviewing the patch.

I have committed this.

 Fix compilation error after merge
 -

 Key: HDFS-5974
 URL: https://issues.apache.org/jira/browse/HDFS-5974
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: HDFS-5535 (Rolling upgrades)

 Attachments: h5974_20140219.patch


 {noformat}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34]
  cannot find symbol
 symbol  : variable Feature
 location: class 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start


 [ 
https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5742:


Hadoop Flags: Reviewed

+1 for the patch.  Thank you, Arpit.

 DatanodeCluster (mini cluster of DNs) fails to start
 

 Key: HDFS-5742
 URL: https://issues.apache.org/jira/browse/HDFS-5742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor
 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, 
 HDFS-5742.03.patch, HDFS-5742.04.patch


 DatanodeCluster fails to start with NPE in MiniDFSCluster.
 Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
 check for null configuration.
 Also included are a few improvements to DataNodeCluster, details in comments 
 below.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail


[ 
https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906155#comment-13906155
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5963:
--

For testSecondaryNameNode, let's simply remove it since it is not very useful.

Let me also fix the bug in rollback.

 TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
 

 Key: HDFS-5963
 URL: https://issues.apache.org/jira/browse/HDFS-5963
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Arpit Agarwal
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5963_20140218.patch


 {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
 It seems to be caused by the terminate hook used by the test. Commenting out 
 this test case makes other tests in the same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906153#comment-13906153
 ] 

Hadoop QA commented on HDFS-5318:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12629857/HDFS-5318-trunk-c.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6175//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6175//console

This message is automatically generated.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, 
 HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906167#comment-13906167
 ] 

stack commented on HDFS-5274:
-

My guess is that the failures are unrelated.  We can rerun the patch or just 
wait on next iteration.

Patch looks great to me.  Have you tried it outside of the unit tests to make 
sure you get sensible looking spans and numbers?  Perhaps I can help here?

Fix these in next patch:

+ * This class do nothing If no SpanReceiver is configured .

+   * Trancing information of HTrace, if exists.

Is formatting ok here?

+  if (source != null) {
+proto.setSource(PBHelper.convertDatanodeInfo(source));
+  }
+  send(out, Op.WRITE_BLOCK, proto.build());
+  } finally {
+  if (ts != null) ts.close();
+}

In BlockReceiver, should traceSpan be getting closed?

Is it possible that below throws an exception?

+  scope.getSpan().addKVAnnotation(
+  stream.getBytes(),
+  jas.getCurrentStream().toString().getBytes());

i..e. we can hope out w/o closing the span since the try/finally only happens 
later.

This is in JournalSet in a few places.

TraceInfo and RPCTInfo seem to be same datastructure?  Should we define it 
onetime only and share?'






 Add Tracing to HDFS
 ---

 Key: HDFS-5274
 URL: https://issues.apache.org/jira/browse/HDFS-5274
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 2.1.1-beta
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
 HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
 HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
 d0f0d66b8a258a69.png


 Since Google's Dapper paper has shown the benefits of tracing for a large 
 distributed system, it seems like a good time to add tracing to HDFS.  HBase 
 has added tracing using HTrace.  I propose that the same can be done within 
 HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-19 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906172#comment-13906172
 ] 

Yongjun Zhang commented on HDFS-5939:
-

Thanks [~wheat9] and [~szetszwo].

Based on your input, sounds like we can do the alternative solution as I 
mentioned in my last update

Another alternative to my fix is, to change the interface of 
NetworkTopology.chooseRandom exception spec, and to let it throw 
NodatanodeException instead of InvalidArgumentException when numOfDataNode is 
0.

code
 public Node chooseRandom(String scope) throws NoDatanodeException 
 private Node chooseRandom(String scope, String excludedScope) throws 
NoDatanodeException
/code

If you agree, I will post a new patch with this change.

Thanks,



 WebHdfs returns misleading error code and logs nothing if trying to create a 
 file with no DNs in cluster
 

 Key: HDFS-5939
 URL: https://issues.apache.org/jira/browse/HDFS-5939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-5939.001.patch


 When trying to access hdfs via webhdfs, and when datanode is dead, user will 
 see an exception below without any clue that it's caused by dead datanode:
 $ curl -i -X PUT 
 .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false
 ...
 {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n
  must be positive}}
 Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer

Akira AJISAKA created HDFS-5975:
---

 Summary: Create an option to specify a file path for 
OfflineImageViewer
 Key: HDFS-5975
 URL: https://issues.apache.org/jira/browse/HDFS-5975
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor


The output of OfflineImageViewer becomes quite large if an input fsimage is 
large. I propose '-filePath' option to make the output smaller.

The below command will output the {{ls -R}} of {{/user/root}}.
{code}
hdfs oiv -i input -o output -p Ls -filePath /user/root
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5976) Create unit tests for downgrade and finalize

Haohui Mai created HDFS-5976:


 Summary: Create unit tests for downgrade and finalize
 Key: HDFS-5976
 URL: https://issues.apache.org/jira/browse/HDFS-5976
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5976.000.patch

This jira tracks the effort of implementing unit tests for downgrades and 
finalization during rolling upgrades.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links


 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5962:


Attachment: HDFS-5962.5.patch

Fixed LsrPBImage.java to output the mtime of symlinks.

 Mtime and atime are not persisted for symbolic links
 

 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Akira AJISAKA
Priority: Critical
 Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, 
 HDFS-5962.4.patch, HDFS-5962.5.patch


 In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
 hardcoded to be 0 when saving to fsimage, even though they are recorded in 
 memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5976) Create unit tests for downgrade and finalize


 [ 
https://issues.apache.org/jira/browse/HDFS-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5976:
-

Attachment: HDFS-5976.000.patch

 Create unit tests for downgrade and finalize
 

 Key: HDFS-5976
 URL: https://issues.apache.org/jira/browse/HDFS-5976
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5976.000.patch


 This jira tracks the effort of implementing unit tests for downgrades and 
 finalization during rolling upgrades.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage


 [ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5952:


Assignee: (was: Akira AJISAKA)

 Create a tool to run data analysis on the PB format fsimage
 ---

 Key: HDFS-5952
 URL: https://issues.apache.org/jira/browse/HDFS-5952
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.0.0
Reporter: Akira AJISAKA

 Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
 was merged.
 The motivation of delimited processor is to run data analysis on the fsimage, 
 therefore, there might be more values to create a tool for Hive or Pig that 
 reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail


 [ 
https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5963:
-

Attachment: h5963_20140219.patch

 Let me also fix the bug in rollback.

Talk to [~jingzhao], the rollback bug is quite involved so that we will fix it 
separately.

h5963_20140219.patch: removes testSecondaryNameNode() and comments out 
restartNameNode() in testRollback().

 TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
 

 Key: HDFS-5963
 URL: https://issues.apache.org/jira/browse/HDFS-5963
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Arpit Agarwal
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5963_20140218.patch, h5963_20140219.patch


 {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
 It seems to be caused by the terminate hook used by the test. Commenting out 
 this test case makes other tests in the same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage


[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906220#comment-13906220
 ] 

Akira AJISAKA commented on HDFS-5952:
-

Thank you for your comment.
I'm okay to use XML-based tool, and I don't want to duplicate the code.

 Create a tool to run data analysis on the PB format fsimage
 ---

 Key: HDFS-5952
 URL: https://issues.apache.org/jira/browse/HDFS-5952
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA

 Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
 was merged.
 The motivation of delimited processor is to run data analysis on the fsimage, 
 therefore, there might be more values to create a tool for Hive or Pig that 
 reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906221#comment-13906221
 ] 

Haohui Mai commented on HDFS-5975:
--

I think this feature has good practical impact, since the operator rarely needs 
to do a full lsr starting from the root directory. LsrPBImage should output 
on-demand.

My suggestion is to push this idea one step further -- is it possible to create 
a tool which takes the fsimage, and exposes the read-only version of WebHDFS 
API? You can imagine the tool looks very similar to jhat, except that it 
exposes the WebHDFS API.

That way we can allow the operator to use the existing command-line tool, or 
even the web UI to debug the fsimage. It also allows the operator to 
interactively browsing the file system, figuring out what goes wrong.



 Create an option to specify a file path for OfflineImageViewer
 --

 Key: HDFS-5975
 URL: https://issues.apache.org/jira/browse/HDFS-5975
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor

 The output of OfflineImageViewer becomes quite large if an input fsimage is 
 large. I propose '-filePath' option to make the output smaller.
 The below command will output the {{ls -R}} of {{/user/root}}.
 {code}
 hdfs oiv -i input -o output -p Ls -filePath /user/root
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag

Andrew Wang created HDFS-5977:
-

 Summary: FSImageFormatPBINode does not respect -renameReserved 
upgrade flag
 Key: HDFS-5977
 URL: https://issues.apache.org/jira/browse/HDFS-5977
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Andrew Wang


HDFS-5709 added a new upgrade flag -renameReserved which can be used to 
automatically rename reserved paths like /.reserved encountered during 
upgrade. The new protobuf loading code does not have a similar facility, so 
future reserved paths cannot be automatically renamed via -renameReserved.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag


 [ 
https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5977:
--

Target Version/s: 2.4.0

 FSImageFormatPBINode does not respect -renameReserved upgrade flag
 

 Key: HDFS-5977
 URL: https://issues.apache.org/jira/browse/HDFS-5977
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Andrew Wang
  Labels: protobuf

 HDFS-5709 added a new upgrade flag -renameReserved which can be used to 
 automatically rename reserved paths like /.reserved encountered during 
 upgrade. The new protobuf loading code does not have a similar facility, so 
 future reserved paths cannot be automatically renamed via -renameReserved.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster


[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906230#comment-13906230
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5939:
--

 ... So (1) it never excludes all nodes and (2) we must have numOfDatanodes = 
 1.

Actually, the above statement is wrong.  e.g.
- if scope=/dc, excludedScope=/dc/rack0 and rack0 is the only rack, then 
all nodes are excluded.
- numOfDatanode under the scope is 0.

 WebHdfs returns misleading error code and logs nothing if trying to create a 
 file with no DNs in cluster
 

 Key: HDFS-5939
 URL: https://issues.apache.org/jira/browse/HDFS-5939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-5939.001.patch


 When trying to access hdfs via webhdfs, and when datanode is dead, user will 
 see an exception below without any clue that it's caused by dead datanode:
 $ curl -i -X PUT 
 .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false
 ...
 {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n
  must be positive}}
 Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906235#comment-13906235
 ] 

Akira AJISAKA commented on HDFS-5975:
-

That's a good idea! I'll create another JIRA.

 Create an option to specify a file path for OfflineImageViewer
 --

 Key: HDFS-5975
 URL: https://issues.apache.org/jira/browse/HDFS-5975
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor

 The output of OfflineImageViewer becomes quite large if an input fsimage is 
 large. I propose '-filePath' option to make the output smaller.
 The below command will output the {{ls -R}} of {{/user/root}}.
 {code}
 hdfs oiv -i input -o output -p Ls -filePath /user/root
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906242#comment-13906242
 ] 

Hadoop QA commented on HDFS-5776:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629871/HDFS-5776v21.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6178//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6178//console

This message is automatically generated.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, 
 HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, 
 HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, 
 HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, 
 HDFS-5776v18.txt, HDFS-5776v21.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API

Akira AJISAKA created HDFS-5978:
---

 Summary: Create a tool to take fsimage and expose read-only 
WebHDFS API
 Key: HDFS-5978
 URL: https://issues.apache.org/jira/browse/HDFS-5978
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Akira AJISAKA


Suggested in HDFS-5975.

Add an option to exposes the read-only version of WebHDFS API for 
OfflineImageViewer. You can imagine it looks very similar to jhat.

That way we can allow the operator to use the existing command-line tool, or 
even the web UI to debug the fsimage. It also allows the operator to 
interactively browsing the file system, figuring out what goes wrong.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restarting


 [ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5583:
-

Summary: Make DN send an OOB Ack on shutdown before restarting  (was: Make 
DN send an OOB Ack on shutdown before restaring)

 Make DN send an OOB Ack on shutdown before restarting
 -

 Key: HDFS-5583
 URL: https://issues.apache.org/jira/browse/HDFS-5583
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch


 Add an ability for data nodes to send an OOB response in order to indicate an 
 upcoming upgrade-restart. Client should ignore the pipeline error from the 
 node for a configured amount of time and try reconstruct the pipeline without 
 excluding the restarted node.  If the node does not come back in time, 
 regular pipeline recovery should happen.
 This feature is useful for the applications with a need to keep blocks local. 
 If the upgrade-restart is fast, the wait is preferable to losing locality.  
 It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method


 [ 
https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5973:
---

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

committed, thanks

 add DomainSocket#shutdown method
 

 Key: HDFS-5973
 URL: https://issues.apache.org/jira/browse/HDFS-5973
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.4.0

 Attachments: HDFS-5973.001.patch


 Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX 
 domain sockets.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN


 [ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5483:


Assignee: Arpit Agarwal

 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch, h5483.03.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag


[ 
https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906258#comment-13906258
 ] 

Haohui Mai commented on HDFS-5977:
--

There are two cases here:

# If the user is upgrading from a version that uses the old fsimage, NN will 
use the old loader to load the fsimage, which has handled the flag already.
# For future upgrades, I think that this mechanism is no longer required. For 
example, currently the NN has already reserved {{.reserved}} in the namespace. 
What we need to do here is to regulate ourselves to put special names into 
{{.reserved}}. Making this assumption explicit eliminates the needs of renaming 
during upgrades, therefore the whole workflow for upgrades can be simplified.

 FSImageFormatPBINode does not respect -renameReserved upgrade flag
 

 Key: HDFS-5977
 URL: https://issues.apache.org/jira/browse/HDFS-5977
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Andrew Wang
  Labels: protobuf

 HDFS-5709 added a new upgrade flag -renameReserved which can be used to 
 automatically rename reserved paths like /.reserved encountered during 
 upgrade. The new protobuf loading code does not have a similar facility, so 
 future reserved paths cannot be automatically renamed via -renameReserved.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API


[ 
https://issues.apache.org/jira/browse/HDFS-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906260#comment-13906260
 ] 

Haohui Mai commented on HDFS-5978:
--

As a first step, one can take the current code of LsrPBImage, and then create a 
Netty-based HTTP server that implements the {{LISTSTATUS}} in WebHDFS.

I suggest not using jetty 6 + jersey (which are used in the NN to implement 
webhdfs) in this tool, because they'll bring in quite a few external 
dependency, making the tool much harder to deploy.

 Create a tool to take fsimage and expose read-only WebHDFS API
 --

 Key: HDFS-5978
 URL: https://issues.apache.org/jira/browse/HDFS-5978
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Akira AJISAKA

 Suggested in HDFS-5975.
 Add an option to exposes the read-only version of WebHDFS API for 
 OfflineImageViewer. You can imagine it looks very similar to jhat.
 That way we can allow the operator to use the existing command-line tool, or 
 even the web UI to debug the fsimage. It also allows the operator to 
 interactively browsing the file system, figuring out what goes wrong.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas


 [ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5318:


  Resolution: Fixed
   Fix Version/s: 2.4.0
  3.0.0
Target Version/s: 2.4.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I committed this to trunk, branch-2 and branch-2.4. Thanks for the contribution 
[~sirianni] and also thanks to [~sureshms] for suggesting this approach!

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Eric Sirianni
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, 
 HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag