[jira] [Updated] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5970: - Component/s: (was: hdfs-client) (was: datanode) namenode callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Yongjun Zhang Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905092#comment-13905092 ] Hudson commented on HDFS-5780: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) HDFS-5780. TestRBWBlockInvalidation times out intemittently. Contributed by Mit Desai. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905090#comment-13905090 ] Hudson commented on HDFS-5953: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) Update change description for HDFS-5953 (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestBlockReaderFactory fails if libhadoop.so has not been built --- Key: HDFS-5953 URL: https://issues.apache.org/jira/browse/HDFS-5953 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Assignee: Akira AJISAKA Fix For: 2.4.0 Attachments: HDFS-5953.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ : {code} java.lang.RuntimeException: Although a UNIX domain socket path is configured as /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, we cannot start a localDataXceiverServer because libhadoop cannot be loaded. at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) at org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) {code} This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905096#comment-13905096 ] Hudson commented on HDFS-5893: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates. Contributed by Haohui Mai. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates Key: HDFS-5893 URL: https://issues.apache.org/jira/browse/HDFS-5893 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5893.000.patch When {{HftpFileSystem}} tries to get the data, it create a {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default URLConnectionFactory. It does not import the SSL certificates from ssl-client.xml. Therefore {{HsftpFileSystem}} fails. To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905093#comment-13905093 ] Hudson commented on HDFS-5803: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java TestBalancer.testBalancer0 fails Key: HDFS-5803 URL: https://issues.apache.org/jira/browse/HDFS-5803 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5803.patch The test testBalancer0 fails on branch 2. Below is the stack trace {noformat} java.util.concurrent.TimeoutException: Cluster failed to reached expected values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 280, expected: 300), in more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905361#comment-13905361 ] zhaoyunjiong commented on HDFS-5944: Multiple trailing / is impossible. LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint - Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime is not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905407#comment-13905407 ] Hadoop QA commented on HDFS-5962: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629731/HDFS-5962.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6172//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6172//console This message is automatically generated. Mtime is not persisted for symbolic links - Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 when saving to fsimage, even though it is recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905443#comment-13905443 ] Hudson commented on HDFS-5893: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates. Contributed by Haohui Mai. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates Key: HDFS-5893 URL: https://issues.apache.org/jira/browse/HDFS-5893 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5893.000.patch When {{HftpFileSystem}} tries to get the data, it create a {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default URLConnectionFactory. It does not import the SSL certificates from ssl-client.xml. Therefore {{HsftpFileSystem}} fails. To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905440#comment-13905440 ] Hudson commented on HDFS-5803: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java TestBalancer.testBalancer0 fails Key: HDFS-5803 URL: https://issues.apache.org/jira/browse/HDFS-5803 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5803.patch The test testBalancer0 fails on branch 2. Below is the stack trace {noformat} java.util.concurrent.TimeoutException: Cluster failed to reached expected values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 280, expected: 300), in more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905439#comment-13905439 ] Hudson commented on HDFS-5780: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) HDFS-5780. TestRBWBlockInvalidation times out intemittently. Contributed by Mit Desai. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905437#comment-13905437 ] Hudson commented on HDFS-5953: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) Update change description for HDFS-5953 (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestBlockReaderFactory fails if libhadoop.so has not been built --- Key: HDFS-5953 URL: https://issues.apache.org/jira/browse/HDFS-5953 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Assignee: Akira AJISAKA Fix For: 2.4.0 Attachments: HDFS-5953.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ : {code} java.lang.RuntimeException: Although a UNIX domain socket path is configured as /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, we cannot start a localDataXceiverServer because libhadoop cannot be loaded. at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) at org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) {code} This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905501#comment-13905501 ] Tsz Wo (Nicholas), SZE commented on HDFS-5966: -- Patch looks good. A minor suggestion: add saveMD5File(File dataFile, String digestString) to MD5FileUtils then both renameMD5File and the original saveMD5File can use it. Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5966 URL: https://issues.apache.org/jira/browse/HDFS-5966 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5966.000.patch This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
[ https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905517#comment-13905517 ] Hudson commented on HDFS-5893: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates. Contributed by Haohui Mai. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569477) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates Key: HDFS-5893 URL: https://issues.apache.org/jira/browse/HDFS-5893 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5893.000.patch When {{HftpFileSystem}} tries to get the data, it create a {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default URLConnectionFactory. It does not import the SSL certificates from ssl-client.xml. Therefore {{HsftpFileSystem}} fails. To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built
[ https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905511#comment-13905511 ] Hudson commented on HDFS-5953: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) Update change description for HDFS-5953 (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569579) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestBlockReaderFactory fails if libhadoop.so has not been built --- Key: HDFS-5953 URL: https://issues.apache.org/jira/browse/HDFS-5953 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Assignee: Akira AJISAKA Fix For: 2.4.0 Attachments: HDFS-5953.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/ : {code} java.lang.RuntimeException: Although a UNIX domain socket path is configured as /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT, we cannot start a localDataXceiverServer because libhadoop cannot be loaded. at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:315) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) at org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99) {code} This test failure can be reproduced locally (on Mac). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails
[ https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905514#comment-13905514 ] Hudson commented on HDFS-5803: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569391) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java TestBalancer.testBalancer0 fails Key: HDFS-5803 URL: https://issues.apache.org/jira/browse/HDFS-5803 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5803.patch The test testBalancer0 fails on branch 2. Below is the stack trace {noformat} java.util.concurrent.TimeoutException: Cluster failed to reached expected values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 280, expected: 300), in more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905513#comment-13905513 ] Hudson commented on HDFS-5780: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) HDFS-5780. TestRBWBlockInvalidation times out intemittently. Contributed by Mit Desai. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569368) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5962: - Summary: Mtime and atime are not persisted for symbolic links (was: Mtime is not persisted for symbolic links) Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 when saving to fsimage, even though it is recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5962: - Description: In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. (was: In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 when saving to fsimage, even though it is recorded in memory and shown in the listing until restarting namenode.) Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905547#comment-13905547 ] Kihwal Lee commented on HDFS-5962: -- It should be easy to add a test case for this. Start a mini cluster, create a symlink, do saveNamespace and restart the namenode. Compare the time stamp before and after. Directory (mtime) and file inode (mtime and atime) can be covered in the same test. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5961: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Assignee: Kihwal Lee Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Jing. I've committed this to trunk, branch-2 and branch-2.4. OIV cannot load fsimages containing a symbolic link --- Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5961.patch In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link
[ https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905564#comment-13905564 ] Hudson commented on HDFS-5961: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5188 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5188/]) HDFS-5961. OIV cannot load fsimages containing a symbolic link. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569789) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java OIV cannot load fsimages containing a symbolic link --- Key: HDFS-5961 URL: https://issues.apache.org/jira/browse/HDFS-5961 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5961.patch In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink INodes. So after incorrectly reading in the first symbolic link , the next INode can't be read. HDFS-4850 broke this while fixing other issues. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905606#comment-13905606 ] Yongjun Zhang commented on HDFS-5939: - Thanks Haohui. Indeed, the contract of Random.nextInt() expects numOfDatanodes to be greater than 0, otherwise, it will throw IllegalArgumentException(n must be positive); That's what I listed in the original bug report, and we haven't seen this exception throw from NetworkTopology.chooseRandom(String scope, String excludedScope) until HDFS-5939. Investigation of this bug shows that numOfDatanodes is 0 because no dataNode is running in this case. Prior to my fix, there are three cases of how method NetworkTopology.chooseRandom(String scope, String excludedScope) could finish: 1. return valid Node 2. return null (in the beginning of the method) 3. throw the above exception when calling Random.nextInt() ( in the end of the method). It seems all callers of this method didn't check for case 2. The result would be, if it happens, the caller would result in null pointer exception (again, there is no report saying this ever happened). HDFS-5939 is case 3 where the caller is NamenodeWebHdfs.redirectURI(..). My submitted fix makes chooseRandom method to return null before calling Random.netxInt() when numDatanode is 0, and throw NoDatanodeException from caller side. Basically my fix replace the InvalidArgumentException with NoDatanodeException for this case with an explicit message to help user, With my submitted fix here, if numOfDatanode==0 happens for other callers of chooseRandom method in real case, my fix won't really hide the problem. That is, it will result in null pointer exception, instead of the InvalidArgumentException. Now this is covered by HDFS-5970. I hope there is a field report of HDFS-5970 before we fix HDFS-5970 so we can understand why it happened. Another alternative to my fix is, to change the interface of NetworkTopology.chooseRandom exception spec, and to let it throw NodatanodeException instead of InvalidArgumentException. I didn't do this in my submitted fix for two reasons: - the caller has better chance to provide a more helpful message. - the impact of changing the interface in wider. Would you please let me know what you think? thanks. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value
[ https://issues.apache.org/jira/browse/HDFS-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905627#comment-13905627 ] Yongjun Zhang commented on HDFS-5970: - Thanks Junping. callers of NetworkTopology's chooseRandom method to expect null return value Key: HDFS-5970 URL: https://issues.apache.org/jira/browse/HDFS-5970 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Yongjun Zhang Class NetworkTopology's method public Node chooseRandom(String scope) calls private Node chooseRandom(String scope, String excludedScope) which may return null value. Callers of this method such as BlockPlacementPolicyDefault etc need to be aware that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5898: -- Attachment: HDFS-5898-with-documentation.patch Not sure why the previous build broke. [~atm], were you able to take a look at this patch? Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, HDFS-5898.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-5274: --- Attachment: HDFS-5274-7.patch I am attaching the patch rebased and updated based on review comments. bq. Any reason we take config on construction and in init for SpanReceiverHost? I removed conf from constructor argument. bq. SpanReceiverHost is on only when trace is enabled, right? If so, say so in class comment. SpanReceiverHost is always on, though it do nothing if no SpanReceiver is configured. I added a line in class comment. bq. Has to be a shutdown hook? ShutdownHookManager.get().addShutdownHook ? This is fine unless we envision someone having to override it which I suppose should never happen for an optionally enabled, rare, trace function? Overriding SpanReceiverHost is not necessary, though there could be someone who implement SpanReceiver. I think it is useful to wait for receivers to process all the tracing data on crash scenario. bq. HTraceConfiguration is for testing only? Should be @visiblefortesting only or a comment at least? HTraceConfiguration is used by SpanReceiver implementation, not for testing only. bq. Should there be defines for a few of these? DFSInputStream.close seems fine... only used once DFSInputStream.read? I think it is fine not to define DFSInputStream.read now. There are some fixes in addition to above such as, * removed timing dependency from TestTracing. * added guard by Trace.isTracing() around startSpan() in DFSInputStream, FsShell and WritableRpcEngine. * removed SpanReceiverHost from FsShell and DFSClient. I will add options or config properties to turn on tracing from shell later on another JIRA issue. Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905789#comment-13905789 ] Arpit Agarwal commented on HDFS-5318: - [~sirianni] are these failures related to the patch? Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905797#comment-13905797 ] Brandon Li commented on HDFS-5583: -- Some early comments. I haven't finish viewing all the changes. - In DataNode#shutdownDatanode() can be called only once, and throws exception for the next invocations. I would imagine that after administrator issues dfsadmin shutdownDatanode -upgradecommand, he/she would like to know if the DataNodes received it and if they are in upgrade preparation state. Unless I missed something, it seems the only way to know it is to issue the same command again and expect to receive an exception. Would it be better to either let shutdownDatanode return an error code or have getDataNodeInfo include current datanode state? - Do we plan to have more OOB Ack anytime soon? We can always add new enums instead of reserving a few OOB_RESERVEDx for now. - In DataNode.java: is forUpgrade, upgrade or shutdownForUpgrade a better name than the variable name restarting? :-) - DataXceiverServer.java: please clean the unused import Make DN send an OOB Ack on shutdown before restaring Key: HDFS-5583 URL: https://issues.apache.org/jira/browse/HDFS-5583 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch Add an ability for data nodes to send an OOB response in order to indicate an upcoming upgrade-restart. Client should ignore the pipeline error from the node for a configured amount of time and try reconstruct the pipeline without excluding the restarted node. If the node does not come back in time, regular pipeline recovery should happen. This feature is useful for the applications with a need to keep blocks local. If the upgrade-restart is fast, the wait is preferable to losing locality. It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905821#comment-13905821 ] Chris Nauroth commented on HDFS-4685: - I have merged the HDFS-4685 branch to trunk, as per the passing merge vote here: http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201402.mbox/%3CCABCYYb-3jGNDhhXg%2B-TuFw0f-_2YybAJdiRgUpbkRXEvNvTDYA%40mail.gmail.com%3E Implementation of ACLs in HDFS -- Key: HDFS-4685 URL: https://issues.apache.org/jira/browse/HDFS-4685 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, security Affects Versions: 1.1.2 Reporter: Sachin Jose Assignee: Chris Nauroth Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be achieved using getfacl and setfacl utilities. Is there anybody working on this feature ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-4685: Fix Version/s: 3.0.0 Implementation of ACLs in HDFS -- Key: HDFS-4685 URL: https://issues.apache.org/jira/browse/HDFS-4685 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, security Affects Versions: 1.1.2 Reporter: Sachin Jose Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be achieved using getfacl and setfacl utilities. Is there anybody working on this feature ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905830#comment-13905830 ] Claudio Fahey commented on HDFS-4685: - I am currently traveling and will be back on Thursday 2/20. Email responses may be delayed. Implementation of ACLs in HDFS -- Key: HDFS-4685 URL: https://issues.apache.org/jira/browse/HDFS-4685 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, security Affects Versions: 1.1.2 Reporter: Sachin Jose Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be achieved using getfacl and setfacl utilities. Is there anybody working on this feature ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905841#comment-13905841 ] Haohui Mai commented on HDFS-5962: -- The old fsimage does not persist mtime and atime. The PB-based fsimage follows the old behavior. I wonder whether this is a bug in the old code, or it's done intentionally. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905856#comment-13905856 ] Eric Sirianni commented on HDFS-5318: - For {{TestCacheDirectives.testCacheManagerRestart}}, the failure is in a comparison of BlockPool IDs: {noformat} Inconsistent checkpoint fields. LV = -52 namespaceID = 173186898 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-447030995-67.195.138.22-1392762420027. Expecting respectively: -52; 2; 0; testClusterID; BP-2140913546-67.195.138.22-1392762411177. {noformat} I don't see how that could be related to my change. That test also passes in my environment. For {{TestBalancerWithNodeGroup}}, the failure may be related to my change to {{MiniDFSCluster}} method signatures to allow for overlaying {{Configurations}} on individual {{DataNode}} objects. I'm currently investigating and will update soon. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905847#comment-13905847 ] Hudson commented on HDFS-4685: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5191 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5191/]) Merge HDFS-4685 to trunk. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569870) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntry.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryScope.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclEntryType.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/AclStatus.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsAction.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/AclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/FsCommand.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Ls.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestAcl.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/permission/TestFsPermission.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestAclCommands.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestChRootedFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemDelegation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AclException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclConfigFlag.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/AclStorage.java *
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905867#comment-13905867 ] Andrew Wang commented on HDFS-5318: --- Both of these are known flakies, so I'd be inclined just to go ahead and commit. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5966: Attachment: HDFS-5966.001.patch Thanks for the review, Nicholas! Update the patch to address the comments. Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5966 URL: https://issues.apache.org/jira/browse/HDFS-5966 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905889#comment-13905889 ] Hadoop QA commented on HDFS-5898: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629805/HDFS-5898-with-documentation.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6173//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6173//console This message is automatically generated. Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, HDFS-5898.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5318: Attachment: HDFS-5318-trunk-c.patch Updated patch with fix for {{TestBalancerWithNodeGroup}}. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905906#comment-13905906 ] Eric Sirianni commented on HDFS-5318: - Thanks [~andrew.wang]. I suspected the same thing after reading some JIRAs about {{TestBalancerWithNodeGroup}}. However, it turns out I did actually introduce a bug there :). The updated patch should fix it. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905909#comment-13905909 ] Haohui Mai commented on HDFS-5939: -- Two questions. {{In NetworkTopology}}: # Under what circumstances, {{getNode(excludedScope)}} will exclude all datanodes? # Is it safe to assert that {{numOfDatanodes}} always is greater or equal than 0? [~szetszwo], can you comment on this? WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905912#comment-13905912 ] Arpit Agarwal commented on HDFS-5318: - Thanks for the heads up Andrew. +1 pending Jenkins again. Verified {{TestBalancerWithNodeGroup}} passes with the latest patch. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905924#comment-13905924 ] Arpit Agarwal commented on HDFS-5868: - Nitpcik: {{BlockReceiver#cout}} can be removed. +1 otherwise. Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5962: Attachment: HDFS-5962.3.patch Thanks [~kihwal], added a test-case for loading atime and mtime. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905937#comment-13905937 ] Hadoop QA commented on HDFS-5962: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629868/HDFS-5962.3.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6176//console This message is automatically generated. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905945#comment-13905945 ] Akira AJISAKA commented on HDFS-5962: - [~wheat9], I suppose it's a bug because the output of {{ls}} shows wrong information after restarting NameNode. {code} $ hdfs dfs -ls -rwxrwxrwx - user supergroup 0 1970-01-01 00:00 symlink {code} Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information
[ https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5950: --- Attachment: HDFS-5950.001.patch The DFSClient and DataNode should use shared memory segments to communicate short-circuit information - Key: HDFS-5950 URL: https://issues.apache.org/jira/browse/HDFS-5950 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5950.001.patch The DFSClient and DataNode should use the shared memory segments and unified cache added in the other HDFS-5182 subtasks to communicate short-circuit information. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5966: - Hadoop Flags: Reviewed +1 patch looks good. Will commit it shortly. Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5966 URL: https://issues.apache.org/jira/browse/HDFS-5966 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup
[ https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5966. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) I have committed this. Thanks, Jing! Fix rollback of rolling upgrade in NameNode HA setup Key: HDFS-5966 URL: https://issues.apache.org/jira/browse/HDFS-5966 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: HDFS-5535 (Rolling upgrades) Attachments: HDFS-5966.000.patch, HDFS-5966.001.patch This jira does the following: 1. When do rollback for rolling upgrade, we should call FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in HA setup). 2. After the rollback, we also need to rename the md5 file and change its reference file name. 3. Add a new unit test to cover rollback with HA+QJM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905958#comment-13905958 ] Brandon Li commented on HDFS-5944: -- +1. Both patches look good to me. LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint - Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5776: Attachment: HDFS-5776v21.txt This patch has a few small differences that come of some time spent testing: 1. Adds DEBUG level logging of the one-time setup of the hedged reads pool. 2. Gives the hedged read pool threads a 'hedged' prefix. 3. Changes the 'cancel' behavior so it does NOT cancel ongoing reads. 3. is the biggest change. What I've found is that hdfs reads do not take kindly to being interrupted. The exception types that bubble up are of a few versions -- InterruptedIOException, ClosedByInterruptException, and IOEs whose cause is a IE -- but I also encountered complaints coming up out of protobuf decoding messages likely because the read was cancelled partway through. Then there was a bunch of logging noise -- WARN-level logging -- because of the interrupt exceptions and the fact that on interrupt, the node we were reading against would get added to the dead list. I had a patch that was more involved dealing w/ the interrupt exceptions and redoing the WARNs but it was getting very involved and I was coming to rely on an untrod path, that of interrupted reads so I let it go for now for now. This patch lets outstanding reads finish. Let me chat w/ [~xieliang007] to possibly get production numbers on benefit of patch as is. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, HDFS-5776v18.txt, HDFS-5776v21.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5973) add DomainSocket#shutdown method
Colin Patrick McCabe created HDFS-5973: -- Summary: add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5973: --- Attachment: HDFS-5973.001.patch add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5973: --- Status: Patch Available (was: Open) add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905965#comment-13905965 ] Colin Patrick McCabe commented on HDFS-5973: This is a pretty simple one. Just exposing the existing code which calls {{shutdown(2)}} on a socket. This is to allow me to shutdown the UNIX domain socket associated with a shared memory segment for error handling purposes. {{close}} could be used for this purpose, but it's a little more heavyweight than what I need here, since {{DomainSocket#close}} blocks until the fd is actually, well, closed. Since the UNIX domain socket associated with a shared memory segment is inside a {{DomainSocketWatcher}}, it can't be actually closed until the {{DomainSocketWatcher}} lets go of it. add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905971#comment-13905971 ] Andrew Wang commented on HDFS-5973: --- +1 pending Jenkins bot add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5962: Attachment: HDFS-5962.4.patch Rebased the patch for the latest trunk. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5944: - Status: Patch Available (was: Open) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint - Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0, 1.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5944: - Attachment: HDFS-5944.trunk.patch Upload the same trunk patch to trigger the build. LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint - Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905982#comment-13905982 ] Arpit Agarwal commented on HDFS-5963: - Thanks Nicholas. JDK7 could randomize the test case order so perhaps we need to put testSecondaryNameNode in a separate test class? Is this failure expected? {code} Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.565 sec FAILURE! - in org.apache.hadoop.hdfs.TestRollingUpgrade testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade) Time elapsed: 3.386 sec ERROR! java.io.IOException: There appears to be a gap in the edit log. We expected txid 5, but got txid 8. at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:203) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:131) {code} TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail Key: HDFS-5963 URL: https://issues.apache.org/jira/browse/HDFS-5963 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal Assignee: Tsz Wo (Nicholas), SZE Attachments: h5963_20140218.patch {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test. Commenting out this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905991#comment-13905991 ] Jing Zhao commented on HDFS-5898: - bq. I don't follow how the change in RpcProgramNfs3 is related to this issue. Yes, I think the change in RpcProgramNfs3 is a regression of HDFS-5913. One question: the current patch puts the login into DFSClientCache#getUserGroupInformation, which is called by the load() method of the loading cache. Thus we will call login() every time we miss the cache. Should we put the login call into the constructor of RpcProgramNfs3 instead? Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898-with-documentation.patch, HDFS-5898.patch, HDFS-5898.patch, HDFS-5898.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5776: Release Note: If a read from a block is slow, start up another parallel, 'hedged' read against a different block replica. We then take the result of which ever read returns first (the outstanding read is cancelled). This 'hedged' read feature will help rein in the outliers, the odd read that takes a long time because it hit a bad patch on the disc, etc. This feature is off by default. To enable this feature, set codedfs.client.hedged.read.threadpool.size/code to a positive number. The threadpool size is how many threads to dedicate to the running of these 'hedged', concurrent reads in your client. Then set codedfs.client.hedged.read.threshold.millis/code to the number of milliseconds to wait before starting up a 'hedged' read. For example, if you set this property to 10, then if a read has not returned within 10 milliseconds, we will start up a new read against a different block replica. This feature emits new metrics: + hedgedReadOps + hedgeReadOpsWin -- how many times the hedged read 'beat' the original read + hedgedReadOpsInCurThread -- how many times we went to do a hedged read but we had to run it in the current thread because dfs.client.hedged.read.threadpool.size was at a maximum. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, HDFS-5776v18.txt, HDFS-5776v21.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5868: Attachment: HDFS-5868b-branch-2.patch Updated based on Arpit's comment and regenerated against latest trunk. Thanks Arpit! Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, HDFS-5868b-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5483: Attachment: h5483.03.patch Rebase patch and get updated Jenkins +1. NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch, h5483.03.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5742: Description: DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. was: DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch, HDFS-5742.04.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5974) Fix compilation error after merge
Tsz Wo (Nicholas), SZE created HDFS-5974: Summary: Fix compilation error after merge Key: HDFS-5974 URL: https://issues.apache.org/jira/browse/HDFS-5974 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE {noformat} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34] cannot find symbol symbol : variable Feature location: class org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906043#comment-13906043 ] Hadoop QA commented on HDFS-5973: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629873/HDFS-5973.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6177//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6177//console This message is automatically generated. add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906049#comment-13906049 ] Arpit Agarwal commented on HDFS-5868: - +1 for the patch pending Jenkins. Make hsync implementation pluggable --- Key: HDFS-5868 URL: https://issues.apache.org/jira/browse/HDFS-5868 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.2.0 Reporter: Buddy Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, HDFS-5868b-branch-2.patch The current implementation of hsync in BlockReceiver only works if the output streams are instances of FileOutputStream. Therefore, there is currently no way for a FSDatasetSpi plugin to implement hsync if it is not using standard OS files. One possible solution is to push the implementation of hsync into the ReplicaOutputStreams class. This class is constructed by the ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore it can be extended. Instead of directly calling sync on the output stream, BlockReceiver would call ReplicaOutputStream.sync. The default implementation of sync in ReplicaOutputStream would be the same as the current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906064#comment-13906064 ] Hadoop QA commented on HDFS-5274: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629828/HDFS-5274-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6174//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6174//console This message is automatically generated. Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906079#comment-13906079 ] Chris Nauroth commented on HDFS-5483: - Hi Arpit, This patch looks good. Just one minor comment on {{TestBlockHasMultipleReplicasOnSameDN#startUpCluster}}. There is a visible-for-testing {{DistributedFileSystem#getClient}} method that returns the underlying {{DFSClient}} instance. I'm wondering if the test initialization code can be reduced to {{client = fs.getClient()}}. NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch, h5483.03.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5974) Fix compilation error after merge
[ https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5974: - Attachment: h5974_20140219.patch h5974_20140219.patch: fixes compilation error, NameNodeLayoutVersion and DataNodeLayoutVersion. Fix compilation error after merge - Key: HDFS-5974 URL: https://issues.apache.org/jira/browse/HDFS-5974 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5974_20140219.patch {noformat} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34] cannot find symbol symbol : variable Feature location: class org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906104#comment-13906104 ] Kihwal Lee commented on HDFS-5583: -- Thanks for the review, Brandon. - The admin wants to know whether the command was received: This is determined by the return code of the command. As with other commands, when the return code is not 0, the state is non-deterministic and only then the command may be reissued. I do not believe that this is a common case. Moreover, the shutdown normally take less than two seconds and probably the reissuing shutdown manually take more than that. In my opinion, adding support for reporting progress won't have much value. If you still feel that it needs to be changed, I will change it. Please let me know what you think. - I am planning on adding at least one more OOB ack type in near future for write draining, which will be useful for decommissioining. The reserved enums make certain checks more efficient. I will address the rest of the comments when you finish the review. Make DN send an OOB Ack on shutdown before restaring Key: HDFS-5583 URL: https://issues.apache.org/jira/browse/HDFS-5583 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch Add an ability for data nodes to send an OOB response in order to indicate an upcoming upgrade-restart. Client should ignore the pipeline error from the node for a configured amount of time and try reconstruct the pipeline without excluding the restarted node. If the node does not come back in time, regular pipeline recovery should happen. This feature is useful for the applications with a need to keep blocks local. If the upgrade-restart is fast, the wait is preferable to losing locality. It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906104#comment-13906104 ] Kihwal Lee edited comment on HDFS-5583 at 2/19/14 9:26 PM: --- Thanks for the review, Brandon. - The admin wants to know whether the command was received by the datanode: This is determined by the return code of the command. As with other commands, when the return code is not 0, the state is non-deterministic and only then the command may be reissued. I do not believe that this is a common case. Moreover, the shutdown normally takes less than two seconds and probably the reissuing shutdown manually takes more than that. In my opinion, adding support for reporting progress won't have much value. If you still feel that it needs to be changed, I will change it. Please let me know what you think. - I am planning on adding at least one more OOB ack type in near future for write draining, which will be useful for decommissioining. The reserved enums make certain checks more efficient. I will address the rest of the comments when you finish the review. was (Author: kihwal): Thanks for the review, Brandon. - The admin wants to know whether the command was received: This is determined by the return code of the command. As with other commands, when the return code is not 0, the state is non-deterministic and only then the command may be reissued. I do not believe that this is a common case. Moreover, the shutdown normally take less than two seconds and probably the reissuing shutdown manually take more than that. In my opinion, adding support for reporting progress won't have much value. If you still feel that it needs to be changed, I will change it. Please let me know what you think. - I am planning on adding at least one more OOB ack type in near future for write draining, which will be useful for decommissioining. The reserved enums make certain checks more efficient. I will address the rest of the comments when you finish the review. Make DN send an OOB Ack on shutdown before restaring Key: HDFS-5583 URL: https://issues.apache.org/jira/browse/HDFS-5583 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch Add an ability for data nodes to send an OOB response in order to indicate an upcoming upgrade-restart. Client should ignore the pipeline error from the node for a configured amount of time and try reconstruct the pipeline without excluding the restarted node. If the node does not come back in time, regular pipeline recovery should happen. This feature is useful for the applications with a need to keep blocks local. If the upgrade-restart is fast, the wait is preferable to losing locality. It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5974) Fix compilation error after merge
[ https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5974: Hadoop Flags: Reviewed +1 for the patch. Thanks, Nicholas. Fix compilation error after merge - Key: HDFS-5974 URL: https://issues.apache.org/jira/browse/HDFS-5974 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5974_20140219.patch {noformat} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34] cannot find symbol symbol : variable Feature location: class org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906120#comment-13906120 ] Tsz Wo (Nicholas), SZE commented on HDFS-5939: -- In chooseRandom(..), excludedScope must be null or a proper descendent of scope after the first if-statement. So (1) it never excludes all nodes and (2) we must have numOfDatanodes = 1. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5974) Fix compilation error after merge
[ https://issues.apache.org/jira/browse/HDFS-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-5974. -- Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) Thanks Chris for reviewing the patch. I have committed this. Fix compilation error after merge - Key: HDFS-5974 URL: https://issues.apache.org/jira/browse/HDFS-5974 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: HDFS-5535 (Rolling upgrades) Attachments: h5974_20140219.patch {noformat} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/szetszwo/hadoop/commit-HDFS-5535/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java:[322,34] cannot find symbol symbol : variable Feature location: class org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.AclEditLogUtil [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5742: Hadoop Flags: Reviewed +1 for the patch. Thank you, Arpit. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch, HDFS-5742.04.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. Also included are a few improvements to DataNodeCluster, details in comments below. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906155#comment-13906155 ] Tsz Wo (Nicholas), SZE commented on HDFS-5963: -- For testSecondaryNameNode, let's simply remove it since it is not very useful. Let me also fix the bug in rollback. TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail Key: HDFS-5963 URL: https://issues.apache.org/jira/browse/HDFS-5963 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal Assignee: Tsz Wo (Nicholas), SZE Attachments: h5963_20140218.patch {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test. Commenting out this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906153#comment-13906153 ] Hadoop QA commented on HDFS-5318: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629857/HDFS-5318-trunk-c.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6175//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6175//console This message is automatically generated. Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906167#comment-13906167 ] stack commented on HDFS-5274: - My guess is that the failures are unrelated. We can rerun the patch or just wait on next iteration. Patch looks great to me. Have you tried it outside of the unit tests to make sure you get sensible looking spans and numbers? Perhaps I can help here? Fix these in next patch: + * This class do nothing If no SpanReceiver is configured . + * Trancing information of HTrace, if exists. Is formatting ok here? + if (source != null) { +proto.setSource(PBHelper.convertDatanodeInfo(source)); + } + send(out, Op.WRITE_BLOCK, proto.build()); + } finally { + if (ts != null) ts.close(); +} In BlockReceiver, should traceSpan be getting closed? Is it possible that below throws an exception? + scope.getSpan().addKVAnnotation( + stream.getBytes(), + jas.getCurrentStream().toString().getBytes()); i..e. we can hope out w/o closing the span since the try/finally only happens later. This is in JournalSet in a few places. TraceInfo and RPCTInfo seem to be same datastructure? Should we define it onetime only and share?' Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906172#comment-13906172 ] Yongjun Zhang commented on HDFS-5939: - Thanks [~wheat9] and [~szetszwo]. Based on your input, sounds like we can do the alternative solution as I mentioned in my last update Another alternative to my fix is, to change the interface of NetworkTopology.chooseRandom exception spec, and to let it throw NodatanodeException instead of InvalidArgumentException when numOfDataNode is 0. code public Node chooseRandom(String scope) throws NoDatanodeException private Node chooseRandom(String scope, String excludedScope) throws NoDatanodeException /code If you agree, I will post a new patch with this change. Thanks, WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer
Akira AJISAKA created HDFS-5975: --- Summary: Create an option to specify a file path for OfflineImageViewer Key: HDFS-5975 URL: https://issues.apache.org/jira/browse/HDFS-5975 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor The output of OfflineImageViewer becomes quite large if an input fsimage is large. I propose '-filePath' option to make the output smaller. The below command will output the {{ls -R}} of {{/user/root}}. {code} hdfs oiv -i input -o output -p Ls -filePath /user/root {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5976) Create unit tests for downgrade and finalize
Haohui Mai created HDFS-5976: Summary: Create unit tests for downgrade and finalize Key: HDFS-5976 URL: https://issues.apache.org/jira/browse/HDFS-5976 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5976.000.patch This jira tracks the effort of implementing unit tests for downgrades and finalization during rolling upgrades. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5962) Mtime and atime are not persisted for symbolic links
[ https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5962: Attachment: HDFS-5962.5.patch Fixed LsrPBImage.java to output the mtime of symlinks. Mtime and atime are not persisted for symbolic links Key: HDFS-5962 URL: https://issues.apache.org/jira/browse/HDFS-5962 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Akira AJISAKA Priority: Critical Attachments: HDFS-5692.patch, HDFS-5962.2.patch, HDFS-5962.3.patch, HDFS-5962.4.patch, HDFS-5962.5.patch In {{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded to be 0 when saving to fsimage, even though they are recorded in memory and shown in the listing until restarting namenode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5976) Create unit tests for downgrade and finalize
[ https://issues.apache.org/jira/browse/HDFS-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5976: - Attachment: HDFS-5976.000.patch Create unit tests for downgrade and finalize Key: HDFS-5976 URL: https://issues.apache.org/jira/browse/HDFS-5976 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5976.000.patch This jira tracks the effort of implementing unit tests for downgrades and finalization during rolling upgrades. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage
[ https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5952: Assignee: (was: Akira AJISAKA) Create a tool to run data analysis on the PB format fsimage --- Key: HDFS-5952 URL: https://issues.apache.org/jira/browse/HDFS-5952 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 3.0.0 Reporter: Akira AJISAKA Delimited processor in OfflineImageViewer is not supported after HDFS-5698 was merged. The motivation of delimited processor is to run data analysis on the fsimage, therefore, there might be more values to create a tool for Hive or Pig that reads the PB format fsimage directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
[ https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5963: - Attachment: h5963_20140219.patch Let me also fix the bug in rollback. Talk to [~jingzhao], the rollback bug is quite involved so that we will fix it separately. h5963_20140219.patch: removes testSecondaryNameNode() and comments out restartNameNode() in testRollback(). TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail Key: HDFS-5963 URL: https://issues.apache.org/jira/browse/HDFS-5963 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal Assignee: Tsz Wo (Nicholas), SZE Attachments: h5963_20140218.patch, h5963_20140219.patch {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. It seems to be caused by the terminate hook used by the test. Commenting out this test case makes other tests in the same class pass. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage
[ https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906220#comment-13906220 ] Akira AJISAKA commented on HDFS-5952: - Thank you for your comment. I'm okay to use XML-based tool, and I don't want to duplicate the code. Create a tool to run data analysis on the PB format fsimage --- Key: HDFS-5952 URL: https://issues.apache.org/jira/browse/HDFS-5952 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Delimited processor in OfflineImageViewer is not supported after HDFS-5698 was merged. The motivation of delimited processor is to run data analysis on the fsimage, therefore, there might be more values to create a tool for Hive or Pig that reads the PB format fsimage directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906221#comment-13906221 ] Haohui Mai commented on HDFS-5975: -- I think this feature has good practical impact, since the operator rarely needs to do a full lsr starting from the root directory. LsrPBImage should output on-demand. My suggestion is to push this idea one step further -- is it possible to create a tool which takes the fsimage, and exposes the read-only version of WebHDFS API? You can imagine the tool looks very similar to jhat, except that it exposes the WebHDFS API. That way we can allow the operator to use the existing command-line tool, or even the web UI to debug the fsimage. It also allows the operator to interactively browsing the file system, figuring out what goes wrong. Create an option to specify a file path for OfflineImageViewer -- Key: HDFS-5975 URL: https://issues.apache.org/jira/browse/HDFS-5975 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor The output of OfflineImageViewer becomes quite large if an input fsimage is large. I propose '-filePath' option to make the output smaller. The below command will output the {{ls -R}} of {{/user/root}}. {code} hdfs oiv -i input -o output -p Ls -filePath /user/root {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
Andrew Wang created HDFS-5977: - Summary: FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5977: -- Target Version/s: 2.4.0 FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906230#comment-13906230 ] Tsz Wo (Nicholas), SZE commented on HDFS-5939: -- ... So (1) it never excludes all nodes and (2) we must have numOfDatanodes = 1. Actually, the above statement is wrong. e.g. - if scope=/dc, excludedScope=/dc/rack0 and rack0 is the only rack, then all nodes are excluded. - numOfDatanode under the scope is 0. WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-5939.001.patch When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5975) Create an option to specify a file path for OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906235#comment-13906235 ] Akira AJISAKA commented on HDFS-5975: - That's a good idea! I'll create another JIRA. Create an option to specify a file path for OfflineImageViewer -- Key: HDFS-5975 URL: https://issues.apache.org/jira/browse/HDFS-5975 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor The output of OfflineImageViewer becomes quite large if an input fsimage is large. I propose '-filePath' option to make the output smaller. The below command will output the {{ls -R}} of {{/user/root}}. {code} hdfs oiv -i input -o output -p Ls -filePath /user/root {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906242#comment-13906242 ] Hadoop QA commented on HDFS-5776: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629871/HDFS-5776v21.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6178//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6178//console This message is automatically generated. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, HDFS-5776v18.txt, HDFS-5776v21.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API
Akira AJISAKA created HDFS-5978: --- Summary: Create a tool to take fsimage and expose read-only WebHDFS API Key: HDFS-5978 URL: https://issues.apache.org/jira/browse/HDFS-5978 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Suggested in HDFS-5975. Add an option to exposes the read-only version of WebHDFS API for OfflineImageViewer. You can imagine it looks very similar to jhat. That way we can allow the operator to use the existing command-line tool, or even the web UI to debug the fsimage. It also allows the operator to interactively browsing the file system, figuring out what goes wrong. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restarting
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5583: - Summary: Make DN send an OOB Ack on shutdown before restarting (was: Make DN send an OOB Ack on shutdown before restaring) Make DN send an OOB Ack on shutdown before restarting - Key: HDFS-5583 URL: https://issues.apache.org/jira/browse/HDFS-5583 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch Add an ability for data nodes to send an OOB response in order to indicate an upcoming upgrade-restart. Client should ignore the pipeline error from the node for a configured amount of time and try reconstruct the pipeline without excluding the restarted node. If the node does not come back in time, regular pipeline recovery should happen. This feature is useful for the applications with a need to keep blocks local. If the upgrade-restart is fast, the wait is preferable to losing locality. It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5973) add DomainSocket#shutdown method
[ https://issues.apache.org/jira/browse/HDFS-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5973: --- Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) committed, thanks add DomainSocket#shutdown method Key: HDFS-5973 URL: https://issues.apache.org/jira/browse/HDFS-5973 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5973.001.patch Add a DomainSocket#shutdown method, that allows us to call shutdown on UNIX domain sockets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5483: Assignee: Arpit Agarwal NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch, h5483.03.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906258#comment-13906258 ] Haohui Mai commented on HDFS-5977: -- There are two cases here: # If the user is upgrading from a version that uses the old fsimage, NN will use the old loader to load the fsimage, which has handled the flag already. # For future upgrades, I think that this mechanism is no longer required. For example, currently the NN has already reserved {{.reserved}} in the namespace. What we need to do here is to regulate ourselves to put special names into {{.reserved}}. Making this assumption explicit eliminates the needs of renaming during upgrades, therefore the whole workflow for upgrades can be simplified. FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API
[ https://issues.apache.org/jira/browse/HDFS-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906260#comment-13906260 ] Haohui Mai commented on HDFS-5978: -- As a first step, one can take the current code of LsrPBImage, and then create a Netty-based HTTP server that implements the {{LISTSTATUS}} in WebHDFS. I suggest not using jetty 6 + jersey (which are used in the NN to implement webhdfs) in this tool, because they'll bring in quite a few external dependency, making the tool much harder to deploy. Create a tool to take fsimage and expose read-only WebHDFS API -- Key: HDFS-5978 URL: https://issues.apache.org/jira/browse/HDFS-5978 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Suggested in HDFS-5975. Add an option to exposes the read-only version of WebHDFS API for OfflineImageViewer. You can imagine it looks very similar to jhat. That way we can allow the operator to use the existing command-line tool, or even the web UI to debug the fsimage. It also allows the operator to interactively browsing the file system, figuring out what goes wrong. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5318: Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Target Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk, branch-2 and branch-2.4. Thanks for the contribution [~sirianni] and also thanks to [~sureshms] for suggesting this approach! Support read-only and read-write paths to shared replicas - Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Eric Sirianni Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5318-trunk-c.patch, HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect -renameReserved upgrade flag
[ https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906261#comment-13906261 ] Andrew Wang commented on HDFS-5977: --- Hi Haohui, I had the same thought and agree in spirit, but if you look at the comment history on HDFS-5709, [~sureshms] had some concerns about adding more reserved paths beyond our current two. We agreed to implement a more general solution for this reason. I agree that we don't need to implement this for the PB loader until we add another reserved path, but I filed this JIRA so -renameReserved isn't forgotten about in the future. FSImageFormatPBINode does not respect -renameReserved upgrade flag Key: HDFS-5977 URL: https://issues.apache.org/jira/browse/HDFS-5977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Labels: protobuf HDFS-5709 added a new upgrade flag -renameReserved which can be used to automatically rename reserved paths like /.reserved encountered during upgrade. The new protobuf loading code does not have a similar facility, so future reserved paths cannot be automatically renamed via -renameReserved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)