[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124812#comment-14124812 ] Yongjun Zhang commented on HDFS-6621: - Hi [~ravwojdyla], I studied it a bit more, and it seems to me that on top of the changes you made, we need to replace the {{Dispather.this}} in the following code with {{this}}, {code} try { synchronized (Dispatcher.this) { Dispatcher.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } {code} so to make the scheduling threads with all five transfer threads occupied/unfinished block on its {{source}}, then later if one transfer thread finishes, it would notify this blocked scheduling thread (by your change for problem 2) that a slot is available now. If this makes sense to you, would please try it out with the testing you have done? Again, the first problem seems to be important to fix, but I don't know how important the second one is (see question asked in my last comment). If the fix of problem 1 is good enough, then we can go with it alone. Otherwise, my above suggested change can be explored. Would you please comment? Thanks a lot. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source
[jira] [Commented] (HDFS-7025) HDFS Credential Provider related Unit Test Failure
[ https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124830#comment-14124830 ] Hadoop QA commented on HDFS-7025: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667066/HDFS-7025.1.patch against trunk revision d1fa582. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7933//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7933//console This message is automatically generated. HDFS Credential Provider related Unit Test Failure --- Key: HDFS-7025 URL: https://issues.apache.org/jira/browse/HDFS-7025 Project: Hadoop HDFS Issue Type: Test Components: encryption Affects Versions: 2.4.1 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch Reported by: Xiaomara and investigated by [~cnauroth]. The credential provider related unit tests failed on Windows. The tests try to set up a URI by taking the build test directory and concatenating it with other strings containing the rest of the URI format, i.e.: {code} public void testFactory() throws Exception { Configuration conf = new Configuration(); conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH, UserProvider.SCHEME_NAME + :///, + JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + /test.jks); {code} This logic is incorrect on Windows, because the file path separator will be '\', which violates URI syntax. Forward slash is not permitted. The proper fix is to always do path/URI construction through the org.apache.hadoop.fs.Path class, specifically using the constructors that take explicit parent and child arguments. The affected unit tests are: {code} * TestCryptoAdminCLI * TestDFSUtil * TestEncryptionZones * TestReservedRawPaths {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124869#comment-14124869 ] Hudson commented on HDFS-6940: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #673 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/673/]) HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 88209ce181b5ecc55c0ae2bceff4893ab4817e88) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java Initial refactoring to allow ConsensusNode implementation - Key: HDFS-6940 URL: https://issues.apache.org/jira/browse/HDFS-6940 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.6-alpha, 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.6.0 Attachments: HDFS-6940.patch Minor refactoring of FSNamesystem to open private methods that are needed for CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124868#comment-14124868 ] Hudson commented on HDFS-6898: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #673 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/673/]) HDFS-6898. DN must reserve space for a full block when an RBW block is created. (Contributed by Arpit Agarwal) (arp: rev d1fa58292e87bc29b4ef1278368c2be938a0afc4) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java DN must reserve space for a full block when an RBW block is created --- Key: HDFS-6898 URL: https://issues.apache.org/jira/browse/HDFS-6898 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Gopal V Assignee: Arpit Agarwal Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch DN will successfully create two RBW blocks on the same volume even if the free space is sufficient for just one full block. One or both block writers may subsequently get a DiskOutOfSpace exception. This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124901#comment-14124901 ] Hudson commented on HDFS-6940: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1864 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1864/]) HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 88209ce181b5ecc55c0ae2bceff4893ab4817e88) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java Initial refactoring to allow ConsensusNode implementation - Key: HDFS-6940 URL: https://issues.apache.org/jira/browse/HDFS-6940 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.6-alpha, 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.6.0 Attachments: HDFS-6940.patch Minor refactoring of FSNamesystem to open private methods that are needed for CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124900#comment-14124900 ] Hudson commented on HDFS-6898: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1864 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1864/]) HDFS-6898. DN must reserve space for a full block when an RBW block is created. (Contributed by Arpit Agarwal) (arp: rev d1fa58292e87bc29b4ef1278368c2be938a0afc4) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java DN must reserve space for a full block when an RBW block is created --- Key: HDFS-6898 URL: https://issues.apache.org/jira/browse/HDFS-6898 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Gopal V Assignee: Arpit Agarwal Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch DN will successfully create two RBW blocks on the same volume even if the free space is sufficient for just one full block. One or both block writers may subsequently get a DiskOutOfSpace exception. This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124903#comment-14124903 ] Rafal Wojdyla commented on HDFS-6621: - Hi [~yzhangal] Thanks for comments, sorry for delay. First of all - I agree that first problem is more important, and we should just merge it in. About solution to second problem, do we agree that the problem exists? Especially with big number of threads such waking up for some threads may be lethal even with fix for first problem. Is that correct? It's been a while since I've made this change, and afair I tested both problems/solutions and it they were separate problems, both of them cause premature exists. First problem was more lethal tho. About your comment with waiting - your are completely right! I missed this in the patch. Now I see even more the value of pushing-patches/creating-tickets right away ... not waiting till you have a bunch of changes. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was
[jira] [Updated] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-6994: --- Description: Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ was: Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Reporter: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created
[ https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124908#comment-14124908 ] Hudson commented on HDFS-6898: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1889/]) HDFS-6898. DN must reserve space for a full block when an RBW block is created. (Contributed by Arpit Agarwal) (arp: rev d1fa58292e87bc29b4ef1278368c2be938a0afc4) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java DN must reserve space for a full block when an RBW block is created --- Key: HDFS-6898 URL: https://issues.apache.org/jira/browse/HDFS-6898 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Gopal V Assignee: Arpit Agarwal Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch DN will successfully create two RBW blocks on the same volume even if the free space is sufficient for just one full block. One or both block writers may subsequently get a DiskOutOfSpace exception. This can be avoided by allocating space up front. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124909#comment-14124909 ] Hudson commented on HDFS-6940: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1889/]) HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 88209ce181b5ecc55c0ae2bceff4893ab4817e88) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Initial refactoring to allow ConsensusNode implementation - Key: HDFS-6940 URL: https://issues.apache.org/jira/browse/HDFS-6940 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.6-alpha, 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.6.0 Attachments: HDFS-6940.patch Minor refactoring of FSNamesystem to open private methods that are needed for CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated HDFS-6621: Attachment: HDFS-6621.patch_3 Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: h6997_20140907.patch h6997_20140907.patch: synced with new commits. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6997_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124916#comment-14124916 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch against trunk revision d1fa582. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7935//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6997_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124919#comment-14124919 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch against trunk revision d1fa582. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7936//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6997_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124921#comment-14124921 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch against trunk revision d1fa582. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7938//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6997_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124920#comment-14124920 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch against trunk revision d1fa582. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7937//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6997_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: (was: h6997_20140907.patch) Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: h6584_20140907.patch Oops, uploaded a wrong file. The file should be h6584_20140907.patch. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6584_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7025) HDFS Credential Provider related Unit Test Failure
[ https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7025: Hadoop Flags: Reviewed +1 for the patch. I'll commit this. HDFS Credential Provider related Unit Test Failure --- Key: HDFS-7025 URL: https://issues.apache.org/jira/browse/HDFS-7025 Project: Hadoop HDFS Issue Type: Test Components: encryption Affects Versions: 2.4.1 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch Reported by: Xiaomara and investigated by [~cnauroth]. The credential provider related unit tests failed on Windows. The tests try to set up a URI by taking the build test directory and concatenating it with other strings containing the rest of the URI format, i.e.: {code} public void testFactory() throws Exception { Configuration conf = new Configuration(); conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH, UserProvider.SCHEME_NAME + :///, + JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + /test.jks); {code} This logic is incorrect on Windows, because the file path separator will be '\', which violates URI syntax. Forward slash is not permitted. The proper fix is to always do path/URI construction through the org.apache.hadoop.fs.Path class, specifically using the constructors that take explicit parent and child arguments. The affected unit tests are: {code} * TestCryptoAdminCLI * TestDFSUtil * TestEncryptionZones * TestReservedRawPaths {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7025) HDFS Credential Provider related Unit Test Failure
[ https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7025: Resolution: Fixed Fix Version/s: 2.6.0 Status: Resolved (was: Patch Available) I committed this to trunk and branch-2. Xiaoyu, thank you for contributing this fix. HDFS Credential Provider related Unit Test Failure --- Key: HDFS-7025 URL: https://issues.apache.org/jira/browse/HDFS-7025 Project: Hadoop HDFS Issue Type: Test Components: encryption Affects Versions: 2.4.1 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Fix For: 2.6.0 Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch Reported by: Xiaomara and investigated by [~cnauroth]. The credential provider related unit tests failed on Windows. The tests try to set up a URI by taking the build test directory and concatenating it with other strings containing the rest of the URI format, i.e.: {code} public void testFactory() throws Exception { Configuration conf = new Configuration(); conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH, UserProvider.SCHEME_NAME + :///, + JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + /test.jks); {code} This logic is incorrect on Windows, because the file path separator will be '\', which violates URI syntax. Forward slash is not permitted. The proper fix is to always do path/URI construction through the org.apache.hadoop.fs.Path class, specifically using the constructors that take explicit parent and child arguments. The affected unit tests are: {code} * TestCryptoAdminCLI * TestDFSUtil * TestEncryptionZones * TestReservedRawPaths {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Attachment: HDFS-6506.v3.patch Rebase patch to lastest trunk Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, HDFS-6506.v3.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124935#comment-14124935 ] Yongjun Zhang commented on HDFS-6776: - [~wheat9], Hope the example I gave is convincing that webhdfs is the right place to fix. I think we wouldn't want to tell user that sorry, webhdfs contract doesn't allow accessing insecure cluster from secure cluster, if you need to, please hack your application like how distcp does. Would you please comment at your earliest convenience? Thanks. distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, dummy-token-proxy.js Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at
[jira] [Created] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
Yongjun Zhang created HDFS-7026: --- Summary: Introduce a string constant for Failed to obtain user group info... Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Bug Reporter: Yongjun Zhang Priority: Trivial There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
[ https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reassigned HDFS-7026: --- Assignee: Yongjun Zhang Introduce a string constant for Failed to obtain user group info... - Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Bug Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Trivial There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
[ https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7026: Issue Type: Improvement (was: Bug) Introduce a string constant for Failed to obtain user group info... - Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Trivial There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
[ https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7026: Attachment: HDFS-7206.001.patch Uploade patch rev 001. Thanks for review. Introduce a string constant for Failed to obtain user group info... - Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Trivial Attachments: HDFS-7206.001.patch There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.
[ https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124953#comment-14124953 ] Steve Loughran commented on HDFS-7007: -- The current NN code isn't suitable for subclassing, and the fact that BackupNode does exactly that is a bit dangerous. Specifically, the NN ctor calls {{initialize()}} which appears designed to be overrridden ... but subclasses won't be fully constructed until this happens. The DNs are worse -they start threads in their ctors, which are one of the big forbidden actions of Java. I'd propose making the NN and DN YARN services first, so we have a nice consistent override model. As with the RM, we can make them subclasses of CompositeService, so making it easy to add children. This does not have to be done in the consensus node branch ... it can be done in trunk. Interfaces to plugin ConsensusNode. --- Key: HDFS-7007 URL: https://issues.apache.org/jira/browse/HDFS-7007 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko This is to introduce interfaces in NameNode and namesystem, which are needed to plugin ConsensusNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
[ https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7026: Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Introduce a string constant for Failed to obtain user group info... - Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Trivial Attachments: HDFS-7206.001.patch There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124958#comment-14124958 ] James Thomas commented on HDFS-6981: +1, looks good to me. Getting a 404 when I try to look at the Findbugs warnings -- any idea what's causing those? DN upgrade with layout version change should not use trash -- Key: HDFS-6981 URL: https://issues.apache.org/jira/browse/HDFS-6981 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: Arpit Agarwal Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch, HDFS-6981.07.patch Post HDFS-6800, we can encounter the following scenario: # We start with DN software version -55 and initiate a rolling upgrade to version -56 # We delete some blocks, and they are moved to trash # We roll back to DN software version -55 using the -rollback flag – since we are running the old code (prior to this patch), we will restore the previous directory but will not delete the trash # We append to some of the blocks that were deleted in step 2 # We then restart a DN that contains blocks that were appended to – since the trash still exists, it will be restored at this point, the appended-to blocks will be overwritten, and we will lose the appended data So I think we need to avoid writing anything to the trash directory if we have a previous directory. Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included
[ https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6777: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed as part of HDFS-6634. Supporting consistent edit log reads when in-progress edit log segments are included Key: HDFS-6777 URL: https://issues.apache.org/jira/browse/HDFS-6777 Project: Hadoop HDFS Issue Type: Sub-task Components: qjm Reporter: James Thomas Assignee: James Thomas Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch For inotify, we want to be able to read transactions from in-progress edit log segments so we can serve transactions to listeners soon after they are committed. This JIRA works toward ensuring that we do not send unsync'ed transactions back to the client by 1) discarding in-progress segments if we have a finalized segment starting at the same transaction ID and 2) if there are no finalized segments at the same transaction ID, using only the in-progress segments with the largest seen lastWriterEpoch. See the design document for more background and details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included
[ https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6777: --- Fix Version/s: 2.6.0 Supporting consistent edit log reads when in-progress edit log segments are included Key: HDFS-6777 URL: https://issues.apache.org/jira/browse/HDFS-6777 Project: Hadoop HDFS Issue Type: Sub-task Components: qjm Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch For inotify, we want to be able to read transactions from in-progress edit log segments so we can serve transactions to listeners soon after they are committed. This JIRA works toward ensuring that we do not send unsync'ed transactions back to the client by 1) discarding in-progress segments if we have a finalized segment starting at the same transaction ID and 2) if there are no finalized segments at the same transaction ID, using only the in-progress segments with the largest seen lastWriterEpoch. See the design document for more background and details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included
[ https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6777: --- Description: For inotify, we want to be able to read transactions from in-progress edit log segments so we can serve transactions to listeners soon after they are committed. This JIRA works toward ensuring that we do not send unsync'ed transactions back to the client by discarding in-progress segments if we have a finalized segment starting at the same transaction ID. See the design document for more background and details. (was: For inotify, we want to be able to read transactions from in-progress edit log segments so we can serve transactions to listeners soon after they are committed. This JIRA works toward ensuring that we do not send unsync'ed transactions back to the client by 1) discarding in-progress segments if we have a finalized segment starting at the same transaction ID and 2) if there are no finalized segments at the same transaction ID, using only the in-progress segments with the largest seen lastWriterEpoch. See the design document for more background and details.) Supporting consistent edit log reads when in-progress edit log segments are included Key: HDFS-6777 URL: https://issues.apache.org/jira/browse/HDFS-6777 Project: Hadoop HDFS Issue Type: Sub-task Components: qjm Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch For inotify, we want to be able to read transactions from in-progress edit log segments so we can serve transactions to listeners soon after they are committed. This JIRA works toward ensuring that we do not send unsync'ed transactions back to the client by discarding in-progress segments if we have a finalized segment starting at the same transaction ID. See the design document for more background and details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124963#comment-14124963 ] Hadoop QA commented on HDFS-6621: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667080/HDFS-6621.patch_3 against trunk revision d1fa582. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7934//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7934//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7934//console This message is automatically generated. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) {
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124980#comment-14124980 ] Yongjun Zhang commented on HDFS-6621: - Hi [~ravwojdyla], Thanks a lot for the info and new rev. I meant to change both occurrences of {{Dispather.this}} in the quoted code but seems you only changed one. The unchanged one is actually the key, because it's where the block is synchronized upon. Would you please make that change? Since you mentioned that both problems are real, I think it's worth pursuing both. It would be great if you could still reproduce and test the fix in real clusters. I hope this is still feasible, would you please comment? Thanks. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124981#comment-14124981 ] Yongjun Zhang commented on HDFS-6621: - BTW, thanks [~andrew.wang] for the review and comments, which helped me to look further. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124982#comment-14124982 ] Rafal Wojdyla commented on HDFS-6621: - [~yzhangal] you're correct :D Sorry. Reproducing error on real cluster - that's still feasible, reproducing this in unit tests is kinda hard, I will try to come back with proof based on logs - is that fine? Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated HDFS-6621: Attachment: HDFS-6621.patch_4 Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, HDFS-6621.patch_4 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124995#comment-14124995 ] Hadoop QA commented on HDFS-6506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667086/HDFS-6506.v3.patch against trunk revision a23144f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7940//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7940//console This message is automatically generated. Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, HDFS-6506.v3.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates:
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124997#comment-14124997 ] Yongjun Zhang commented on HDFS-6621: - Hi Rafal, Thanks for the quick response, and that's great to hear! What about we do this: 1. Have a setup to see the problem with the fix at all, to demonstrate the symptom. 2. Try the fix of problem 1 only, to see if there is still problem, and we should try to demonstrate the remaining problem 3. Try the fix of both problems (rev 4), to see if all problems are gone Thanks again! Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, HDFS-6621.patch_4 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125007#comment-14125007 ] Yongjun Zhang commented on HDFS-6621: - Sorry, one typo in item 1 of last comment: with meant without. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, HDFS-6621.patch_4 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space
Tsz Wo Nicholas Sze created HDFS-7027: - Summary: Archival Storage: Mover does not terminate when some storage type is out of space Key: HDFS-7027 URL: https://issues.apache.org/jira/browse/HDFS-7027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Suppose DISK is run out of space and there are some block replicas needed to be moved to DISK. In this case, it is impossible to move any replica to DISK. Then, Mover may not terminate since it keeps trying to schedule moving the replicas to DISK in each iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage
[ https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Luo updated HDFS-6893: Attachment: HDFS-6893.patch HDFS-6893.patch Moving crypto command to after classpath crypto subcommand is not sorted properly in hdfs's hadoop_usage --- Key: HDFS-6893 URL: https://issues.apache.org/jira/browse/HDFS-6893 Project: Hadoop HDFS Issue Type: Bug Components: scripts Reporter: Allen Wittenauer Priority: Trivial Labels: newbie Attachments: HDFS-6893.patch crypto subcommand should be after classpath/before datanode, not after zkfc, in the hdfs usage output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage
[ https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Luo updated HDFS-6893: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) crypto subcommand is not sorted properly in hdfs's hadoop_usage --- Key: HDFS-6893 URL: https://issues.apache.org/jira/browse/HDFS-6893 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie Attachments: HDFS-6893.patch crypto subcommand should be after classpath/before datanode, not after zkfc, in the hdfs usage output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125021#comment-14125021 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667084/h6584_20140907.patch against trunk revision d1fa582. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 23 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1264 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.server.namenode.TestINodeFile org.apache.hadoop.hdfs.server.namenode.TestNameNodeXAttr org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.TestEncryptionZones org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.server.mover.TestStorageMover org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.fs.TestSymlinkHdfsFileContext org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.fs.TestSymlinkHdfsFileSystem org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.TestListFilesInFileContext org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream org.apache.hadoop.hdfs.server.namenode.TestFSImage org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7939//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7939//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7939//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6584_20140907.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the
[jira] [Commented] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...
[ https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125029#comment-14125029 ] Hadoop QA commented on HDFS-7026: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667090/HDFS-7206.001.patch against trunk revision a23144f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7941//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7941//console This message is automatically generated. Introduce a string constant for Failed to obtain user group info... - Key: HDFS-7026 URL: https://issues.apache.org/jira/browse/HDFS-7026 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Trivial Attachments: HDFS-7206.001.patch There are multiple places that refer to hard-coded string {{Failed to obtain user group information:}}, which serves as a contract between different places. Filing this jira to replace the hardcoded string with a constant to make it easier to maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6538) Element comment format error in org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry
[ https://issues.apache.org/jira/browse/HDFS-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Luo updated HDFS-6538: Attachment: HDFS-6538.patch HDFS-6538 Changed comment to javadoc Element comment format error in org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry --- Key: HDFS-6538 URL: https://issues.apache.org/jira/browse/HDFS-6538 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: debugging Priority: Trivial Labels: documentation Attachments: HDFS-6538.patch Original Estimate: 1h Remaining Estimate: 1h The element comment for javadoc should be started by {noformat}/**{noformat}, but it starts with only {noformat}/*{noformat} for class ShortCircuitRegistry. So I think there is a {noformat}*{noformat} Omitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space
[ https://issues.apache.org/jira/browse/HDFS-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7027: -- Attachment: h7027_20140908.patch h7027_20140908.patch: when there is no move scheduled, returns false in processFile(..). Archival Storage: Mover does not terminate when some storage type is out of space - Key: HDFS-7027 URL: https://issues.apache.org/jira/browse/HDFS-7027 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7027_20140908.patch Suppose DISK is run out of space and there are some block replicas needed to be moved to DISK. In this case, it is impossible to move any replica to DISK. Then, Mover may not terminate since it keeps trying to schedule moving the replicas to DISK in each iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125054#comment-14125054 ] Hadoop QA commented on HDFS-6621: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667093/HDFS-6621.patch_4 against trunk revision a23144f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7942//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7942//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7942//console This message is automatically generated. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, HDFS-6621.patch_4 I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e);
[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125090#comment-14125090 ] Megasthenis Asteris commented on HDFS-6799: --- _unfinalizeBlock_ indeed needs to be fixed, but I am not sure what the expected behavior should be. The way it is structured now, it seems that it would suffice to delete the block from the SimulatedFSDataset's map of blocks. However, note that this is not exactly the opposite of _finilizeBlock_ as one might expect. Also, I realized that _TestSimulatedFSDataset_ also has bugs: _checkInvalidBlock(ExtendedBlock b)_ creates a new simulated dataset every time it is called. Clearly, b will not be in the new dataset and _checkInvalidBlock_ practically always assumes that b is indeed an invalid block. Should I submit this as a separate bug, or fix it here? The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system. --- Key: HDFS-6799 URL: https://issues.apache.org/jira/browse/HDFS-6799 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 2.4.1 Reporter: Megasthenis Asteris Assignee: Megasthenis Asteris Priority: Minor Attachments: HDFS-6799.patch The invalidate(String bpid, Block[] invalidBlks) method in SimulatedFSDataset.java should remove all invalidBlks from the simulated file system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6988) Make RAM disk eviction thresholds configurable
[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-6988. - Resolution: Duplicate Fix Version/s: HDFS-6581 Assignee: Arpit Agarwal The fix for HDFS-6991 adds config keys for eviction parameters. Make RAM disk eviction thresholds configurable -- Key: HDFS-6988 URL: https://issues.apache.org/jira/browse/HDFS-6988 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: HDFS-6581 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable. The hard-coded thresholds may not be appropriate for very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-6991) Notify NN of evicted block before deleting it from RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6991 started by Arpit Agarwal. --- Notify NN of evicted block before deleting it from RAM disk --- Key: HDFS-6991 URL: https://issues.apache.org/jira/browse/HDFS-6991 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6991.01.patch, HDFS-6991.02.patch, HDFS-6991.03.patch When evicting a block from RAM disk to persistent storage, the DN should notify the NN of the persistent replica before deleting the replica from RAM disk. Else there can be a window of time during which the block is considered 'missing' by the NN. Found by [~xyao] via HDFS-6950. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage
[ https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125105#comment-14125105 ] Hadoop QA commented on HDFS-6893: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667098/HDFS-6893.patch against trunk revision a23144f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestEncryptionZones org.apache.hadoop.hdfs.server.datanode.TestBPOfferService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7943//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7943//console This message is automatically generated. crypto subcommand is not sorted properly in hdfs's hadoop_usage --- Key: HDFS-6893 URL: https://issues.apache.org/jira/browse/HDFS-6893 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie Attachments: HDFS-6893.patch crypto subcommand should be after classpath/before datanode, not after zkfc, in the hdfs usage output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125108#comment-14125108 ] Hadoop QA commented on HDFS-6981: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667061/HDFS-6981.07.patch against trunk revision a23144f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7944//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7944//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7944//console This message is automatically generated. DN upgrade with layout version change should not use trash -- Key: HDFS-6981 URL: https://issues.apache.org/jira/browse/HDFS-6981 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: Arpit Agarwal Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch, HDFS-6981.07.patch Post HDFS-6800, we can encounter the following scenario: # We start with DN software version -55 and initiate a rolling upgrade to version -56 # We delete some blocks, and they are moved to trash # We roll back to DN software version -55 using the -rollback flag – since we are running the old code (prior to this patch), we will restore the previous directory but will not delete the trash # We append to some of the blocks that were deleted in step 2 # We then restart a DN that contains blocks that were appended to – since the trash still exists, it will be restored at this point, the appended-to blocks will be overwritten, and we will lose the appended data So I think we need to avoid writing anything to the trash directory if we have a previous directory. Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125126#comment-14125126 ] Suresh Srinivas commented on HDFS-6940: --- [~atm] had specifically asked not to commit this to trunk. Why is this committed to trunk and branch-2 without any discussion? I agree with him that we should not be making methods public or protected due to the burden of maintaining this contract. You mentioned two backward incompatible changes (I would like to know what they are). But there are numerous others that are never detected because it is taken care of by the committers in the project. Lets not lose sight of that. We have also had difficulty removing other dead code due to vetoes such as BackupNode. So I want to be careful before committing code without a decision on the content that this refactoring is being done for should even be in HDFS. Without addressing the comments from [~atm] who has been participating in this discussion, this patch should not have been committed to trunk. [~cos], please be respectful. One thing that I had held out making comment on is, your committership was based on the work done in fault injection related work done in Hadoop. I believe you have not contributed enough to the other parts of the system that this patch is touching. One of the honor rule is, a committer refrains from voting +1 on a patch related to the areas that he has not contributed to. But I have seen in many of the jiras this is not followed by you including this one. I am -1 on this patch going into trunk and branch-2. Lets do this in the feature branch. This is not a big enough refactor that makes merges difficult. I think we should revert this change.q I also would like to hear other committers to comment on this issue and give their thoughts. Initial refactoring to allow ConsensusNode implementation - Key: HDFS-6940 URL: https://issues.apache.org/jira/browse/HDFS-6940 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.6-alpha, 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.6.0 Attachments: HDFS-6940.patch Minor refactoring of FSNamesystem to open private methods that are needed for CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space
[ https://issues.apache.org/jira/browse/HDFS-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7027: -- Attachment: h7027_20140908b.patch h7027_20140908b.patch: increase the capacities set in the tests for the changes by HDFS-6898. Archival Storage: Mover does not terminate when some storage type is out of space - Key: HDFS-7027 URL: https://issues.apache.org/jira/browse/HDFS-7027 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7027_20140908.patch, h7027_20140908b.patch Suppose DISK is run out of space and there are some block replicas needed to be moved to DISK. In this case, it is impossible to move any replica to DISK. Then, Mover may not terminate since it keeps trying to schedule moving the replicas to DISK in each iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6875) Archival Storage: support migration for a list of specified paths
[ https://issues.apache.org/jira/browse/HDFS-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6875: -- Component/s: balancer Hadoop Flags: Reviewed The patch does not apply anymore. Need to fix the imports. +1 patch looks good other than that. Archival Storage: support migration for a list of specified paths - Key: HDFS-6875 URL: https://issues.apache.org/jira/browse/HDFS-6875 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-6875.000.patch Currently the migration tool processes the whole namespace. It will be helpful if we can allow users to migrate data only for a list of specified paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7028) Archival Storage: FSDirectory should not get storage policy id from symlinks
Tsz Wo Nicholas Sze created HDFS-7028: - Summary: Archival Storage: FSDirectory should not get storage policy id from symlinks Key: HDFS-7028 URL: https://issues.apache.org/jira/browse/HDFS-7028 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor {noformat} java.lang.UnsupportedOperationException: Storage policy are not supported on symlinks at org.apache.hadoop.hdfs.server.namenode.INodeSymlink.getStoragePolicyID(INodeSymlink.java:151) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileInfo(FSDirectory.java:1506) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3992) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getLinkTarget(NameNodeRpcServer.java:1028) at org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testValidSymlinkTarget(TestINodeFile.java:683) at org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testInodeIdBasedPaths(TestINodeFile.java:622) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: h6584_20140908.patch h6584_20140908.patch: with HDFS-7028. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6584_20140907.patch, h6584_20140908.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125197#comment-14125197 ] Colin Patrick McCabe commented on HDFS-6994: bq. Dynamically loading libjvm is a good idea, but it seems not solve all the problems you mentioned in HADOOP-10388. To make fall back feature work, users have to deploy the HDFS jars on every machine. This adds operational complexity for non-Java clients that just want to integrate with HDFS. Otherwise, fall back feature will not work. Sorry if I wasn't clear earlier. I don't think users of libhdfs3 (or ndfs, etc) should be *required* to deploy the jar files. I just said that *optionally*, if the jar files are deployed, fallback should be possible. It's great that libhdfs3 can function without jar files, and we should preserve this capability! bq. And fall back feature will finally be removed when the native client implement the full HDFS client feature. In practice, I think fallback to JNI will continue to be useful for a long, long time. Think about clients that want to interface with s3, Ceph, Azure FileSystem, or even LocalFileSystem. Currently the native code doesn't support those, and it's unlikely to get that support in the near future. So it's useful to have a library that can speak both JNI and the native HDFS protocol, depending on which is available. Users just want one library that they can use that will just work for multiple different configurations. Anyway, we can discuss the fallback code later. It might be feasible to keep the fallback code entirely outside of libhdfs3 in some separate shim library. I think we can merge libhdfs3 first and figure that out later. bq. Would you please review the code and give some comments? Thanks in advance. Thanks Zhanwei, I will take a look as soon as I get in tomorrow. bq. Naming is hard _ I don't have a problem with naming it libhdfs3, but users might wonder what libhdfs2 was :) How about libndfs++ as a name? libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Reporter: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash
[ https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125200#comment-14125200 ] Arpit Agarwal commented on HDFS-6981: - bq. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. I think Jenkins is hitting a bug. findbugs passed for me locally and the link to the warnings is broken. {color:green}+1 overall{color}. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version ) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. DN upgrade with layout version change should not use trash -- Key: HDFS-6981 URL: https://issues.apache.org/jira/browse/HDFS-6981 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: Arpit Agarwal Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, HDFS-6981.06.patch, HDFS-6981.07.patch Post HDFS-6800, we can encounter the following scenario: # We start with DN software version -55 and initiate a rolling upgrade to version -56 # We delete some blocks, and they are moved to trash # We roll back to DN software version -55 using the -rollback flag – since we are running the old code (prior to this patch), we will restore the previous directory but will not delete the trash # We append to some of the blocks that were deleted in step 2 # We then restart a DN that contains blocks that were appended to – since the trash still exists, it will be restored at this point, the appended-to blocks will be overwritten, and we will lose the appended data So I think we need to avoid writing anything to the trash directory if we have a previous directory. Thanks to [~james.thomas] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6705) Create an XAttr that disallows the HDFS admin from accessing a file
[ https://issues.apache.org/jira/browse/HDFS-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125209#comment-14125209 ] Andrew Wang commented on HDFS-6705: --- Hi Charles, thanks for working on this, a few review comments: * Why did the exception messages change in TestDistributedFileSystem and SymlinkBaseTest? This is mildly incompatible, so I'd like to understand why it's necessary. * We're still doing another path resolution to do checkUnreadableBySuperuser. Can we try to reuse the inode from the IIP just below? This would also let us avoid throwing IOException in the check method. * Consider folding FSPermissionChecker#checkUnreadableBySuperuser into the FSN method, it's pretty simple. * FSN#checkXAttrChangeAccess has unrelated change? * Indentation of FSN#checkUnreadableBySuperuser is off Doc: * Extra whitespace change * Text is still kinda verbose and still mentions preventing read access to other xattrs and writing to the file. I'd prefer something like: {noformat} The security namespace is reserved for internal HDFS use. This namespace is generally not accessible through userspace methods. One particular use of security is the security.hdfs.unreadable.by.superuser extended attribute. This xattr can only be set on files, and it will prevent the superuser from reading the file's contents. The superuser can still read and modify file metadata, such as the owner, permissions, etc. This xattr can be set and accessed by any user, assuming normal filesystem permissions. This xattr is also write-once, and cannot be removed once set. This xattr does not allow a value to be set. {noformat} * Unrelated changes in TestXAttrCLI, TestSymlinkHdfsFileSystem FSXattrBaseTest * High-level comment, I'd like to pare down the new tests to focus on this new functionality * I still see references to MAX_XATTR_SIZE which should be unrelated here. It also involves an extra mini cluster stop and start. * I'd like to avoid doing extra minicluster start/stops to test persistence too. It'd be better to add some security xattrs to the existing restart tests instead. * The vanilla xattrs test, it doesn't have a matching call to {{fail}}. I don't think this needs to be tested anyway, since UBS doesn't affect xattr operations. * verifyFileAccess also still has testing for append and create, which isn't valid anymore. * I see a hardcoded security.hdfs.unreadable.by.superuser still, sub in the string constant instead? * Is RemoteException is being thrown by DistributedFileSystem for the new AccessControlException? I see it being unwrapped in DFSClient, so I would expect to see an ACE here. Tests: * Mention of special xattr is non-specific, could we say unreadable by superuser or UBS or something instead? Create an XAttr that disallows the HDFS admin from accessing a file --- Key: HDFS-6705 URL: https://issues.apache.org/jira/browse/HDFS-6705 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: 3.0.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6705.001.patch, HDFS-6705.002.patch, HDFS-6705.003.patch There needs to be an xattr that specifies that the HDFS admin can not access a file. This is needed for m/r delegation tokens and data at rest encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation
[ https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125224#comment-14125224 ] Todd Lipcon commented on HDFS-6940: --- I'll only comment on the technical issue at hand here: I strongly agree that implementation inheritance/subclassing is not a maintainable extension mechanism for the NameNode. The issue is that, while composition through interfaces and peer class relationships can be well defined and documented and typically does not expose implementation details, making previously private methods public is doing exactly that. When we later want to reorganize the (implementation-specific) code of the NameNode, the existence of subclasses makes this very difficult. This is not an abstract argument. I experienced this pain first hand several years ago when working on HDFS-1073, and then again working on HDFS-1623. When methods and members are protected, then doing these kind of refactors becomes quite arduous -- the implementations of the base (NameNode) and the plugin (BackupNode in that case) are very tightly coupled, and tight coupling makes changes difficult. Can we lay out which specific plug points you need to make ConsensusNode work and define interfaces for them instead of using overriding/subclasses? Initial refactoring to allow ConsensusNode implementation - Key: HDFS-6940 URL: https://issues.apache.org/jira/browse/HDFS-6940 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.6-alpha, 2.5.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.6.0 Attachments: HDFS-6940.patch Minor refactoring of FSNamesystem to open private methods that are needed for CNode implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6843) Create FileStatus isEncrypted() method
[ https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125228#comment-14125228 ] Andrew Wang commented on HDFS-6843: --- Hi Charles, thanks for sticking with this. Hopefully we're finally closing in on the right solution. * FileStatus#isEncrypted, the javadoc is incorrect in that directories can be encrypted too. * When accessed from within /.reserved/raw, I think things should still show up as encrypted. It's a little inconsistent right now, since files wouldn't show up as isEncrypted, while dirs would. This would be a good thing to have in a unit test. * Would like a similar test as to what's in FSAclBaseTest that makes sure we can't set the isEncrypted bit * GNU {{ls}} uses {{*}} to indicate that a file is executable. I'd prefer not to overload this meaning in our webui. * Can you comment on manual testing done for the webUI? I think we used to have unit testing for the webui, but that might have gone away with the JS rewrite. Create FileStatus isEncrypted() method -- Key: HDFS-6843 URL: https://issues.apache.org/jira/browse/HDFS-6843 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: 3.0.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6843.001.patch, HDFS-6843.002.patch, HDFS-6843.003.patch, HDFS-6843.004.patch, HDFS-6843.005.patch, HDFS-6843.005.patch FileStatus should have a 'boolean isEncrypted()' method. (it was in the context of discussing with AndreW about FileStatus being a Writable). Having this method would allow MR JobSubmitter do the following: - BOOLEAN intermediateEncryption = false IF jobconf.contains(mr.intermidate.encryption) THEN intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption) ELSE IF (I/O)Format INSTANCEOF File(I/O)Format THEN intermediateEncryption = ANY File(I/O)Format HAS a Path with status isEncrypted()==TRUE FI jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption) FI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones
[ https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125230#comment-14125230 ] Andrew Wang commented on HDFS-6951: --- Blech, it looks like test-patch doesn't like binary diffs. It used to just ignore the binary part of the patch, which is the behavior when I use {{patch}}. Maybe now that we're on git, it's time to revisit HADOOP-10926. I'll take a look at this, but let's go back to non-binary diff to get test runs. Sorry for the bad instructions on my part. Saving namespace and restarting NameNode will remove existing encryption zones -- Key: HDFS-6951 URL: https://issues.apache.org/jira/browse/HDFS-6951 Project: Hadoop HDFS Issue Type: Sub-task Components: encryption Affects Versions: 3.0.0 Reporter: Stephen Chu Assignee: Charles Lamb Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, HDFS-6951.004.patch, HDFS-6951.005.patch, HDFS-6951.006.patch, editsStored Currently, when users save namespace and restart the NameNode, pre-existing encryption zones will be wiped out. I could reproduce this on a pseudo-distributed cluster: * Create an encryption zone * List encryption zones and verify the newly created zone is present * Save the namespace * Kill and restart the NameNode * List the encryption zones and you'll find the encryption zone is missing I've attached a test case for {{TestEncryptionZones}} that reproduces this as well. Removing the saveNamespace call will get the test to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125237#comment-14125237 ] Hadoop QA commented on HDFS-6584: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667122/h6584_20140908.patch against trunk revision 0974f43. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 23 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1264 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.server.mover.TestStorageMover org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7945//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7945//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7945//console This message is automatically generated. Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, h6584_20140907.patch, h6584_20140908.patch In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)