[jira] [Commented] (HDFS-8375) Add cellSize as an XAttr to ECZone
[ https://issues.apache.org/jira/browse/HDFS-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549840#comment-14549840 ] Zhe Zhang commented on HDFS-8375: - HDFS-8320 was just committed so there will be some additional rebase. I can help with that part if needed. Add cellSize as an XAttr to ECZone -- Key: HDFS-8375 URL: https://issues.apache.org/jira/browse/HDFS-8375 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-8375-HDFS-7285-01.patch, HDFS-8375-HDFS-7285-02.patch Add {{cellSize}} as an Xattr for ECZone. as discussed [here|https://issues.apache.org/jira/browse/HDFS-8347?focusedCommentId=14539108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14539108] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8320) Erasure coding: consolidate striping-related terminologies
[ https://issues.apache.org/jira/browse/HDFS-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549884#comment-14549884 ] Hadoop QA commented on HDFS-8320: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 5s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 1s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 58s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 50s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 49s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 109m 52s | Tests failed in hadoop-hdfs. | | | | 156m 15s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 194] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 252] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 913-915] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 117] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 108] | | Failed unit tests | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFileTruncate | | Timed out tests | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.server.namenode.TestHostsFiles | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733703/HDFS-8320-HDFS-7285.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / b596edc | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11044/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11044/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11044/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11044/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11044/console | This message was automatically generated. Erasure coding: consolidate striping-related terminologies
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549883#comment-14549883 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549882#comment-14549882 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549930#comment-14549930 ] Leitao Guo commented on HDFS-7692: -- Sorry it's my mistake to comment many times here! It seems that my network condition is not very good now... DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6348: Labels: (was: BB2015-05-RFC) SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) startup used too much time to load edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549939#comment-14549939 ] Tsz Wo Nicholas Sze commented on HDFS-7609: --- {quote} PriorityQueue#remove is O\(n), so that definitely could be problematic. It's odd that there would be so many collisions that this would become noticeable though. Are any of you running a significant number of legacy applications linked to the RPC code before introduction of the retry cache support? If that were the case, then perhaps a huge number of calls are not supplying a call ID, and then the NN is getting a default call ID value from protobuf decoding, thus causing a lot of collisions. {quote} The priority queue can be improved using a balanced tree as stated in the java comment in LightWeightCache. We should do it if it could fix the problem. {code} //LightWeightCache.java /* * The memory footprint for java.util.PriorityQueue is low but the * remove(Object) method runs in linear time. We may improve it by using a * balanced tree. However, we do not yet have a low memory footprint balanced * tree implementation. */ private final PriorityQueueEntry queue; {code} BTW, the priority queue is used to evict entries according the expiration time. All the entries (with any key, i.e. any caller ID) are stored in it. startup used too much time to load edits Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Labels: BB2015-05-RFC Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4273) Fix some issue in DFSInputstream
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549996#comment-14549996 ] Masatake Iwasaki commented on HDFS-4273: I'm looking into this and writing my understanding for other reviewers here: All of HDFS-4273, HDFS-5917 and HDFS-6022 addresses improvement of refreshing {{deadNodes}}. I think HDFS-6022 is the most promising. (Actually, newest v8 patch omitted the deadNodes part.) Other issues addressed here are # There is a case it should retry but don't # There is race condition around {{failures}} HDFS-5776 changed a lot of relevant code around {{chooseDataNode}} method and the v8 patch is difficult to rebase. Fixes around {{seekToNewSource}} alone may work and I'll try to update the patch. Fix some issue in DFSInputstream Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java Following issues in DFSInputStream are addressed in this jira: 1. read may not retry enough in some cases cause early failure Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue. 3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8333) Create EC zone should not need superuser privilege
[ https://issues.apache.org/jira/browse/HDFS-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550012#comment-14550012 ] Walter Su commented on HDFS-8333: - Patch looks good. I'm +1 for this idea. Hi, [~drankye], and [~zhz]. How do you think about it? Create EC zone should not need superuser privilege -- Key: HDFS-8333 URL: https://issues.apache.org/jira/browse/HDFS-8333 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8333-HDFS-7285.000.patch create EC zone should not need superuser privilege, for example, in multiple tenant scenario, common users only manage their own directory and subdirectory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549852#comment-14549852 ] Yi Liu commented on HDFS-8428: -- I also see the {{NullPointerException}} in {{TestDFSStripedInputStream}}, although the test passed, but actually there is exception: {code} 2015-05-19 13:27:08,944 WARN ipc.Server (Server.java:run(2190)) - IPC Server handler 2 on 50789, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:59424 Call#123 Retry#0 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getStoredBlock(BlockManager.java:3581) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeStoredBlock(BlockManager.java:3209) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:3390) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5545) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1344) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:222) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:29418) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) {code} Erasure Coding: Fix the NullPointerException when deleting file --- Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549901#comment-14549901 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549898#comment-14549898 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549907#comment-14549907 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549903#comment-14549903 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549905#comment-14549905 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549896#comment-14549896 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549894#comment-14549894 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549887#comment-14549887 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549889#comment-14549889 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549908#comment-14549908 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549904#comment-14549904 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549906#comment-14549906 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549892#comment-14549892 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549885#comment-14549885 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549890#comment-14549890 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549886#comment-14549886 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549888#comment-14549888 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8428: - Status: Patch Available (was: Open) Erasure Coding: Fix the NullPointerException when deleting file --- Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8428-HDFS-7285.001.patch In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8378) Erasure Coding: Few improvements for the erasure coding worker
[ https://issues.apache.org/jira/browse/HDFS-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549938#comment-14549938 ] Walter Su commented on HDFS-8378: - Thanks [~rakeshr] for the contribution! It's committed in the branch. Erasure Coding: Few improvements for the erasure coding worker -- Key: HDFS-8378 URL: https://issues.apache.org/jira/browse/HDFS-8378 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Fix For: HDFS-7285 Attachments: HDFS-8378-HDFS-7285.00.patch # Following log is confusing, make it tidy. Its missing {{break;}} statement and causing this unwanted logs. {code} 2015-05-10 15:06:45,878 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(728)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2015-05-10 15:06:45,879 WARN datanode.DataNode (BPOfferService.java:processCommandFromActive(732)) - Unknown DatanodeCommand action: 11 {code} # Add exception trace to the log, would improve debuggability {code} } catch (Throwable e) { LOG.warn(Failed to recover striped block: + blockGroup); } {code} # Make member variables present in ErasureCodingWorker, ReconstructAndTransferBlock, StripedReader {{private}} {{final}} # Correct spelling of the variable {{STRIPED_READ_TRHEAD_POOL}} to {{STRIPED_READ_THREAD_POOL}} # Good to add debug logs to print the striped read pool size {code} LOG.debug(Using striped reads; pool threads= + num); {code} # Add meaningful message to the precondition check: {code} Preconditions.checkArgument(liveIndices.length == sources.length); {code} # Remove unused import {code} import org.apache.hadoop.hdfs.server.common.HdfsServerConstants; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8378) Erasure Coding: Few improvements for the erasure coding worker
[ https://issues.apache.org/jira/browse/HDFS-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8378: Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Erasure Coding: Few improvements for the erasure coding worker -- Key: HDFS-8378 URL: https://issues.apache.org/jira/browse/HDFS-8378 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Fix For: HDFS-7285 Attachments: HDFS-8378-HDFS-7285.00.patch # Following log is confusing, make it tidy. Its missing {{break;}} statement and causing this unwanted logs. {code} 2015-05-10 15:06:45,878 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(728)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2015-05-10 15:06:45,879 WARN datanode.DataNode (BPOfferService.java:processCommandFromActive(732)) - Unknown DatanodeCommand action: 11 {code} # Add exception trace to the log, would improve debuggability {code} } catch (Throwable e) { LOG.warn(Failed to recover striped block: + blockGroup); } {code} # Make member variables present in ErasureCodingWorker, ReconstructAndTransferBlock, StripedReader {{private}} {{final}} # Correct spelling of the variable {{STRIPED_READ_TRHEAD_POOL}} to {{STRIPED_READ_THREAD_POOL}} # Good to add debug logs to print the striped read pool size {code} LOG.debug(Using striped reads; pool threads= + num); {code} # Add meaningful message to the precondition check: {code} Preconditions.checkArgument(liveIndices.length == sources.length); {code} # Remove unused import {code} import org.apache.hadoop.hdfs.server.common.HdfsServerConstants; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
Yi Liu created HDFS-8428: Summary: Erasure Coding: Fix the NullPointerException when deleting file Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8131) Implement a space balanced block placement policy
[ https://issues.apache.org/jira/browse/HDFS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549847#comment-14549847 ] Hadoop QA commented on HDFS-8131: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 167m 57s | Tests passed in hadoop-hdfs. | | | | 210m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733696/HDFS-8131.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11042/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11042/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11042/console | This message was automatically generated. Implement a space balanced block placement policy - Key: HDFS-8131 URL: https://issues.apache.org/jira/browse/HDFS-8131 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Labels: BlockPlacementPolicy Attachments: HDFS-8131-v1.diff, HDFS-8131-v2.diff, HDFS-8131-v3.diff, HDFS-8131.004.patch, HDFS-8131.005.patch, HDFS-8131.006.patch, balanced.png The default block placement policy will choose datanodes for new blocks randomly, which will result in unbalanced space used percent among datanodes after an cluster expansion. The old datanodes always are in high used percent of space and new added ones are in low percent. Through we can used the external balance tool to balance the space used rate, it will cost extra network IO and it's not easy to control the balance speed. An easy solution is to implement an balanced block placement policy which will choose low used percent datanodes for new blocks with a little high possibility. In a not long term, the used percent of datanodes will trend to be balanced. Suggestions and discussions are welcomed. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated HDFS-7692: - Attachment: HDFS-7692.02.patch DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8375) Add cellSize as an XAttr to ECZone
[ https://issues.apache.org/jira/browse/HDFS-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8375: Attachment: HDFS-8375-HDFS-7285-03.patch Attached the rebased patch. Please review Add cellSize as an XAttr to ECZone -- Key: HDFS-8375 URL: https://issues.apache.org/jira/browse/HDFS-8375 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-8375-HDFS-7285-01.patch, HDFS-8375-HDFS-7285-02.patch, HDFS-8375-HDFS-7285-03.patch Add {{cellSize}} as an Xattr for ECZone. as discussed [here|https://issues.apache.org/jira/browse/HDFS-8347?focusedCommentId=14539108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14539108] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8428: - Attachment: HDFS-8428-HDFS-7285.001.patch We should not convert get block group from striped block replica right now when handling {{DELETED_BLOCK}}, later we will convert it, and also we may postpone it. I have checked that there is no exception in the log of test after this patch. Erasure Coding: Fix the NullPointerException when deleting file --- Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8428-HDFS-7285.001.patch In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549899#comment-14549899 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549902#comment-14549902 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549893#comment-14549893 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549900#comment-14549900 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549895#comment-14549895 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549897#comment-14549897 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549891#comment-14549891 ] Yi Liu edited comment on HDFS-8428 at 5/19/15 6:40 AM: --- We should not get block group from striped block replica right now when handling {{DELETED_BLOCK}} (same as we do for {{RECEIVED_BLOCK}} and {{RECEIVING_BLOCK}}), later we will convert it, and also we may postpone it. I have checked that there is no exception in the log of test after this patch. was (Author: hitliuyi): We should not convert get block group from striped block replica right now when handling {{DELETED_BLOCK}} (same as we do for {{RECEIVED_BLOCK}} and {{RECEIVING_BLOCK}}), later we will convert it, and also we may postpone it. I have checked that there is no exception in the log of test after this patch. Erasure Coding: Fix the NullPointerException when deleting file --- Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8428-HDFS-7285.001.patch In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549913#comment-14549913 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549910#comment-14549910 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549914#comment-14549914 ] Leitao Guo commented on HDFS-7692: -- [~eddyxu], thanks for your comments, please have a check of the new patch. 1.In DataStorage#recoverTransitionRead, log the InterruptedException and rethrow it as InterruptedIOException; 2.In TestDataStorage#testAddStorageDirectoreis, catch InterruptedException then let the test case fail; 3.The multithread in DataStorage#addStorageLocations() is for one specific namespace, so in TestDataStorage#testAddStorageDirectoreis my intention is creating one thread pool for each namespace. Not change here. 4.Re-phrase the parameter successVolumes. [~szetszwo],thanks for your comments, please have a check of the new patch. 1. InterruptedException re-thrown as InterruptedIOException; 2. I think it's a good idea to log the upgrade progress for each dir, but so far, we can not get the progress easily from the current api. Do you think it's necessary to file a new jira to follow this? DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549891#comment-14549891 ] Yi Liu edited comment on HDFS-8428 at 5/19/15 6:39 AM: --- We should not convert get block group from striped block replica right now when handling {{DELETED_BLOCK}} (same as we do for {{RECEIVED_BLOCK}} and {{RECEIVING_BLOCK}}), later we will convert it, and also we may postpone it. I have checked that there is no exception in the log of test after this patch. was (Author: hitliuyi): We should not convert get block group from striped block replica right now when handling {{DELETED_BLOCK}}, later we will convert it, and also we may postpone it. I have checked that there is no exception in the log of test after this patch. Erasure Coding: Fix the NullPointerException when deleting file --- Key: HDFS-8428 URL: https://issues.apache.org/jira/browse/HDFS-8428 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8428-HDFS-7285.001.patch In HDFS, when removing some file, NN will also remove all its blocks from {{BlocksMap}}, and send {{DNA_INVALIDATE}} (invalidate blocks) commands to datanodes. After datanodes successfully delete the block replicas, will report {{DELETED_BLOCK}} to NameNode. snippet code logic in {{BlockManager#processIncrementalBlockReport}} is as following {code} case DELETED_BLOCK: removeStoredBlock(storageInfo, getStoredBlock(rdbi.getBlock()), node); ... {code} {code} private void removeStoredBlock(DatanodeStorageInfo storageInfo, Block block, DatanodeDescriptor node) { if (shouldPostponeBlocksFromFuture namesystem.isGenStampInFuture(block)) { queueReportedBlock(storageInfo, block, null, QUEUE_REASON_FUTURE_GENSTAMP); return; } removeStoredBlock(getStoredBlock(block), node); } {code} In EC branch, we add {{getStoredBlock}}. There is {{NullPointerException}} when handling {{DELETED_BLOCK}} of incrementalBlockReport from DataNode after delete a file, since the block is already removed, we need to check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549925#comment-14549925 ] Vinayakumar B commented on HDFS-6348: - Patch LGTM +1, failures are unrelated. Going to commit shortly SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6348: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks [~rakeshr] SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549942#comment-14549942 ] Hudson commented on HDFS-6348: -- FAILURE: Integrated in Hadoop-trunk-Commit #7858 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7858/]) HDFS-6348. SecondaryNameNode not terminating properly on runtime exceptions (Contributed by Rakesh R) (vinayakumarb: rev 93972a332a9fc6390447fc5fc9785c98fb4c3344) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549988#comment-14549988 ] Rakesh R commented on HDFS-6348: Thank you [~vinayrpet] for the reviews and committing the patch. SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8366) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549992#comment-14549992 ] Tsz Wo Nicholas Sze commented on HDFS-8366: --- Why do we need the blocking queue timeouts? In what case the timeouts are useful? I think we should remove all the blocking queue timeouts since we already have network timeout. Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream --- Key: HDFS-8366 URL: https://issues.apache.org/jira/browse/HDFS-8366 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8366-001.patch, HDFS-8366-HDFS-7285-02.patch The timeout of getting striped or ended block in {{DFSStripedOutputStream#Coodinator}} should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8378) Erasure Coding: Few improvements for the erasure coding worker
[ https://issues.apache.org/jira/browse/HDFS-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549994#comment-14549994 ] Rakesh R commented on HDFS-8378: Thank you [~walter.k.su] for committing the changes. Erasure Coding: Few improvements for the erasure coding worker -- Key: HDFS-8378 URL: https://issues.apache.org/jira/browse/HDFS-8378 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Fix For: HDFS-7285 Attachments: HDFS-8378-HDFS-7285.00.patch # Following log is confusing, make it tidy. Its missing {{break;}} statement and causing this unwanted logs. {code} 2015-05-10 15:06:45,878 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(728)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2015-05-10 15:06:45,879 WARN datanode.DataNode (BPOfferService.java:processCommandFromActive(732)) - Unknown DatanodeCommand action: 11 {code} # Add exception trace to the log, would improve debuggability {code} } catch (Throwable e) { LOG.warn(Failed to recover striped block: + blockGroup); } {code} # Make member variables present in ErasureCodingWorker, ReconstructAndTransferBlock, StripedReader {{private}} {{final}} # Correct spelling of the variable {{STRIPED_READ_TRHEAD_POOL}} to {{STRIPED_READ_THREAD_POOL}} # Good to add debug logs to print the striped read pool size {code} LOG.debug(Using striped reads; pool threads= + num); {code} # Add meaningful message to the precondition check: {code} Preconditions.checkArgument(liveIndices.length == sources.length); {code} # Remove unused import {code} import org.apache.hadoop.hdfs.server.common.HdfsServerConstants; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8366) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550006#comment-14550006 ] Li Bo commented on HDFS-8366: - Hi, Nicholas Different streamers may have different write speed, if one streamer is too slow, the leading stream should not wait for its ended block forever. So does the retrieving of striped blocks. This sub task just replaces the two constants(30 90) with configurable properties in {{DFSStripedOutputStream#Coordinator}}. Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream --- Key: HDFS-8366 URL: https://issues.apache.org/jira/browse/HDFS-8366 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8366-001.patch, HDFS-8366-HDFS-7285-02.patch The timeout of getting striped or ended block in {{DFSStripedOutputStream#Coodinator}} should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8366) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8366: Resolution: Fixed Status: Resolved (was: Patch Available) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream --- Key: HDFS-8366 URL: https://issues.apache.org/jira/browse/HDFS-8366 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8366-001.patch, HDFS-8366-HDFS-7285-02.patch The timeout of getting striped or ended block in {{DFSStripedOutputStream#Coodinator}} should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550079#comment-14550079 ] Takanobu Asanuma commented on HDFS-7687: Thank you for your help, Jing! Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma Attachments: HDFS-7687.1.patch, HDFS-7687.2.patch, HDFS-7687.3.patch We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8375) Add cellSize as an XAttr to ECZone
[ https://issues.apache.org/jira/browse/HDFS-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550083#comment-14550083 ] Hadoop QA commented on HDFS-8375: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 3s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 20 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 57s | The applied patch generated 2 new checkstyle issues (total was 67, now 67). | | {color:red}-1{color} | whitespace | 0m 14s | The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 3s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 20s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 101m 51s | Tests failed in hadoop-hdfs. | | {color:red}-1{color} | hdfs tests | 0m 13s | Tests failed in hadoop-hdfs-client. | | | | 146m 48s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-client | | | org.apache.hadoop.hdfs.protocol.LocatedStripedBlock.getBlockIndices() may expose internal representation by returning LocatedStripedBlock.blockIndices At LocatedStripedBlock.java:by returning LocatedStripedBlock.blockIndices At LocatedStripedBlock.java:[line 57] | | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 194] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 252] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 913-915] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 107] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.getStartOffsetsForInternalBlocks(ECSchema, int, LocatedStripedBlock, long) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.getStartOffsetsForInternalBlocks(ECSchema, int, LocatedStripedBlock, long) At StripedBlockUtil.java:[line 403] | | Failed unit tests | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.TestRead | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.datanode.TestRefreshNamenodes | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement | | | hadoop.hdfs.server.namenode.TestNameNodeRpcServer | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.cli.TestErasureCodingCLI | | | hadoop.hdfs.TestWriteReadStripedFile | | |
[jira] [Commented] (HDFS-7692) DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories.
[ https://issues.apache.org/jira/browse/HDFS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550131#comment-14550131 ] Hadoop QA commented on HDFS-7692: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 164m 3s | Tests failed in hadoop-hdfs. | | | | 205m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733723/HDFS-7692.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11047/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11047/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11047/console | This message was automatically generated. DataStorage#addStorageLocations(...) should support MultiThread to speedup the upgrade of block pool at multi storage directories. -- Key: HDFS-7692 URL: https://issues.apache.org/jira/browse/HDFS-7692 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.2 Reporter: Leitao Guo Assignee: Leitao Guo Labels: BB2015-05-TBR Attachments: HDFS-7692.01.patch, HDFS-7692.02.patch {code:title=DataStorage#addStorageLocations(...)|borderStyle=solid} for (StorageLocation dataDir : dataDirs) { File root = dataDir.getFile(); ... ... bpStorage.recoverTransitionRead(datanode, nsInfo, bpDataDirs, startOpt); addBlockPoolStorage(bpid, bpStorage); ... ... successVolumes.add(dataDir); } {code} In the above code the storage directories will be analyzed one by one, which is really time consuming when upgrading HDFS with datanodes have dozens of large volumes. MultiThread dataDirs analyzing should be supported here to speedup upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8320) Erasure coding: consolidate striping-related terminologies
[ https://issues.apache.org/jira/browse/HDFS-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550033#comment-14550033 ] Hadoop QA commented on HDFS-8320: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 14m 46s | Pre-patch HDFS-7285 JavaDoc compilation may be broken. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 15s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 22s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 116m 40s | Tests failed in hadoop-hdfs. | | | | 158m 50s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 194] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 252] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 913-915] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 117] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 107] | | Failed unit tests | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.TestRead | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestSaveNamespace | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade |
[jira] [Created] (HDFS-8429) Death of watcherThread making other local read blocked
zhouyingchao created HDFS-8429: -- Summary: Death of watcherThread making other local read blocked Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8333) Create EC zone should not need superuser privilege
[ https://issues.apache.org/jira/browse/HDFS-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550147#comment-14550147 ] Kai Zheng commented on HDFS-8333: - It looks good to me. As I expressed in HDFS-8112, the superuser privilege might be too restricted to operations for both EC zone and schemas. I thought [~zhangyongxyz] raised a reasonable case here. [~szetszwo] should we consider it for now? Thanks. Create EC zone should not need superuser privilege -- Key: HDFS-8333 URL: https://issues.apache.org/jira/browse/HDFS-8333 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8333-HDFS-7285.000.patch create EC zone should not need superuser privilege, for example, in multiple tenant scenario, common users only manage their own directory and subdirectory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8366) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550188#comment-14550188 ] Tsz Wo Nicholas Sze commented on HDFS-8366: --- ... if one streamer is too slow, the leading stream should not wait for its ended block forever ... Would it be forever? There are network timeout already. Why network timeout is not enough? Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream --- Key: HDFS-8366 URL: https://issues.apache.org/jira/browse/HDFS-8366 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8366-001.patch, HDFS-8366-HDFS-7285-02.patch The timeout of getting striped or ended block in {{DFSStripedOutputStream#Coodinator}} should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8366) Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550192#comment-14550192 ] Tsz Wo Nicholas Sze commented on HDFS-8366: --- ... This sub task just replaces the two constants(30 90) with configurable properties in DFSStripedOutputStream#Coordinator. I am not talking about this subtask. I am asking why do we need these timeouts in the beginning? Erasure Coding: Make the timeout parameter of polling blocking queue configurable in DFSStripedOutputStream --- Key: HDFS-8366 URL: https://issues.apache.org/jira/browse/HDFS-8366 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8366-001.patch, HDFS-8366-HDFS-7285-02.patch The timeout of getting striped or ended block in {{DFSStripedOutputStream#Coodinator}} should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8428) Erasure Coding: Fix the NullPointerException when deleting file
[ https://issues.apache.org/jira/browse/HDFS-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550203#comment-14550203 ] Hadoop QA commented on HDFS-8428: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 16s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 14s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 41s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 22s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 22s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 175m 18s | Tests failed in hadoop-hdfs. | | | | 218m 16s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 194] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 252] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 913-915] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 117] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 107] | | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.blockmanagement.TestBlockInfo | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733735/HDFS-8428-HDFS-7285.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 3cf3398 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11049/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11049/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11049/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11049/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11049/console | This message was automatically generated.
[jira] [Commented] (HDFS-4185) Add a metric for number of active leases
[ https://issues.apache.org/jira/browse/HDFS-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550279#comment-14550279 ] Hudson commented on HDFS-4185: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) HDFS-4185. Add a metric for number of active leases (Rakesh R via raviprak) (raviprak: rev cdfae446ad285db979a79bf55665363fd943702c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Add a metric for number of active leases Key: HDFS-4185 URL: https://issues.apache.org/jira/browse/HDFS-4185 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.4, 2.0.2-alpha Reporter: Kihwal Lee Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-4185-001.patch, HDFS-4185-002.patch, HDFS-4185-003.patch, HDFS-4185-004.patch, HDFS-4185-005.patch, HDFS-4185-006.patch, HDFS-4185-007.patch, HDFS-4185-008.patch, HDFS-4185-009.patch We have seen cases of systematic open file leaks, which could have been detected if we have a metric that shows number of active leases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550278#comment-14550278 ] Hudson commented on HDFS-6348: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) HDFS-6348. SecondaryNameNode not terminating properly on runtime exceptions (Contributed by Rakesh R) (vinayakumarb: rev 93972a332a9fc6390447fc5fc9785c98fb4c3344) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8345) Storage policy APIs must be exposed via the FileSystem interface
[ https://issues.apache.org/jira/browse/HDFS-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550274#comment-14550274 ] Hudson commented on HDFS-8345: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) HDFS-8345. Storage policy APIs must be exposed via the FileSystem interface. (Arpit Agarwal) (arp: rev a2190bf15d25e01fb4b220ba6401ce2f787a5c61) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/AbstractFileSystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BlockStoragePolicySpi.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/BlockStoragePolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/mover/Mover.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFs.java Storage policy APIs must be exposed via the FileSystem interface Key: HDFS-8345 URL: https://issues.apache.org/jira/browse/HDFS-8345 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-8345.01.patch, HDFS-8345.02.patch, HDFS-8345.03.patch, HDFS-8345.04.patch, HDFS-8345.05.patch, HDFS-8345.06.patch, HDFS-8345.07.patch The storage policy APIs are not exposed via FileSystem. Since DistributedFileSystem is tagged as LimitedPrivate we should expose the APIs through FileSystem for use by other applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8412) Fix the test failures in HTTPFS: In some tests setReplication called after fs close.
[ https://issues.apache.org/jira/browse/HDFS-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550277#comment-14550277 ] Hudson commented on HDFS-8412: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) HDFS-8412. Fix the test failures in HTTPFS: In some tests setReplication called after fs close. Contributed by Uma Maheswara Rao G. (umamahesh: rev a6af0248e9ec75e8e46ac96593070e0c9841a660) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java Fix the test failures in HTTPFS: In some tests setReplication called after fs close. Key: HDFS-8412 URL: https://issues.apache.org/jira/browse/HDFS-8412 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0 Attachments: HDFS-8412-0.patch Currently 2 HTTFS test cases failing due to filesystem open check in fs operations This is the JIRA fix these failures. Failure seems like test case is closing fs first and then doing operation. Ideally such test could pas earlier as dfsClient was just contacting with NN directly. But that particular closed client will not be useful for any other ops like read/write. So, usage should be corrected here IMO. {code} fs.close(); fs.setReplication(path, (short) 2); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8405) Fix a typo in NamenodeFsck
[ https://issues.apache.org/jira/browse/HDFS-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550276#comment-14550276 ] Hudson commented on HDFS-8405: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) HDFS-8405. Fix a typo in NamenodeFsck. Contributed by Takanobu Asanuma (szetszwo: rev 0c590e1c097462979f7ee054ad9121345d58655b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FsckServlet.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java Fix a typo in NamenodeFsck -- Key: HDFS-8405 URL: https://issues.apache.org/jira/browse/HDFS-8405 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8405.1.patch DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY below should not be quoted. {code} res.append(\n ).append(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY:\t) .append(minReplication); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550293#comment-14550293 ] zhouyingchao commented on HDFS-8429: [~cmccabe] Should we stop DN in this condition? Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8405) Fix a typo in NamenodeFsck
[ https://issues.apache.org/jira/browse/HDFS-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550306#comment-14550306 ] Hudson commented on HDFS-8405: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) HDFS-8405. Fix a typo in NamenodeFsck. Contributed by Takanobu Asanuma (szetszwo: rev 0c590e1c097462979f7ee054ad9121345d58655b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FsckServlet.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Fix a typo in NamenodeFsck -- Key: HDFS-8405 URL: https://issues.apache.org/jira/browse/HDFS-8405 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8405.1.patch DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY below should not be quoted. {code} res.append(\n ).append(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY:\t) .append(minReplication); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) SecondaryNameNode not terminating properly on runtime exceptions
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550308#comment-14550308 ] Hudson commented on HDFS-6348: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) HDFS-6348. SecondaryNameNode not terminating properly on runtime exceptions (Contributed by Rakesh R) (vinayakumarb: rev 93972a332a9fc6390447fc5fc9785c98fb4c3344) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt SecondaryNameNode not terminating properly on runtime exceptions Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-6348-003.patch, HDFS-6348-004.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4185) Add a metric for number of active leases
[ https://issues.apache.org/jira/browse/HDFS-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550309#comment-14550309 ] Hudson commented on HDFS-4185: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) HDFS-4185. Add a metric for number of active leases (Rakesh R via raviprak) (raviprak: rev cdfae446ad285db979a79bf55665363fd943702c) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java Add a metric for number of active leases Key: HDFS-4185 URL: https://issues.apache.org/jira/browse/HDFS-4185 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.4, 2.0.2-alpha Reporter: Kihwal Lee Assignee: Rakesh R Fix For: 2.8.0 Attachments: HDFS-4185-001.patch, HDFS-4185-002.patch, HDFS-4185-003.patch, HDFS-4185-004.patch, HDFS-4185-005.patch, HDFS-4185-006.patch, HDFS-4185-007.patch, HDFS-4185-008.patch, HDFS-4185-009.patch We have seen cases of systematic open file leaks, which could have been detected if we have a metric that shows number of active leases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8345) Storage policy APIs must be exposed via the FileSystem interface
[ https://issues.apache.org/jira/browse/HDFS-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550305#comment-14550305 ] Hudson commented on HDFS-8345: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) HDFS-8345. Storage policy APIs must be exposed via the FileSystem interface. (Arpit Agarwal) (arp: rev a2190bf15d25e01fb4b220ba6401ce2f787a5c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/mover/Mover.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/AbstractFileSystem.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BlockStoragePolicySpi.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/BlockStoragePolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java Storage policy APIs must be exposed via the FileSystem interface Key: HDFS-8345 URL: https://issues.apache.org/jira/browse/HDFS-8345 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-8345.01.patch, HDFS-8345.02.patch, HDFS-8345.03.patch, HDFS-8345.04.patch, HDFS-8345.05.patch, HDFS-8345.06.patch, HDFS-8345.07.patch The storage policy APIs are not exposed via FileSystem. Since DistributedFileSystem is tagged as LimitedPrivate we should expose the APIs through FileSystem for use by other applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8412) Fix the test failures in HTTPFS: In some tests setReplication called after fs close.
[ https://issues.apache.org/jira/browse/HDFS-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550307#comment-14550307 ] Hudson commented on HDFS-8412: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) HDFS-8412. Fix the test failures in HTTPFS: In some tests setReplication called after fs close. Contributed by Uma Maheswara Rao G. (umamahesh: rev a6af0248e9ec75e8e46ac96593070e0c9841a660) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java Fix the test failures in HTTPFS: In some tests setReplication called after fs close. Key: HDFS-8412 URL: https://issues.apache.org/jira/browse/HDFS-8412 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0 Attachments: HDFS-8412-0.patch Currently 2 HTTFS test cases failing due to filesystem open check in fs operations This is the JIRA fix these failures. Failure seems like test case is closing fs first and then doing operation. Ideally such test could pas earlier as dfsClient was just contacting with NN directly. But that particular closed client will not be useful for any other ops like read/write. So, usage should be corrected here IMO. {code} fs.close(); fs.setReplication(path, (short) 2); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8439) Adding more slow action log in critical read path
[ https://issues.apache.org/jira/browse/HDFS-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HDFS-8439: -- Attachment: HDFS-8439.001.patch Patch for trunk Adding more slow action log in critical read path - Key: HDFS-8439 URL: https://issues.apache.org/jira/browse/HDFS-8439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HDFS-8439.001.patch To dig a HBase read spike issue, we'd better to add more slow pread/seek log in read flow to get the abnormal datanodes. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8441) Erasure Coding: make condition check earlier for setReplication
[ https://issues.apache.org/jira/browse/HDFS-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8441: Attachment: HDFS-8441-HDFS-7285.001.patch Erasure Coding: make condition check earlier for setReplication --- Key: HDFS-8441 URL: https://issues.apache.org/jira/browse/HDFS-8441 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Changes: 1. {{UnsupportedActionException}} is more user-firendly. 2. check condition before update quota count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8441) Erasure Coding: make condition check earlier for setReplication
[ https://issues.apache.org/jira/browse/HDFS-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8441: Attachment: (was: HDFS-8441-HDFS-7285.001.patch) Erasure Coding: make condition check earlier for setReplication --- Key: HDFS-8441 URL: https://issues.apache.org/jira/browse/HDFS-8441 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Changes: 1. {{UnsupportedActionException}} is more user-firendly. 2. check condition before update quota count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8440) Switch off checkstyle file length warnings
Arpit Agarwal created HDFS-8440: --- Summary: Switch off checkstyle file length warnings Key: HDFS-8440 URL: https://issues.apache.org/jira/browse/HDFS-8440 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Arpit Agarwal We have many large files over 2000 lines. checkstyle warns every time there is a change to one of these files. Let's switch off this check or increase the limit to reduce the number of non-actionable -1's from Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8436) Changing the replication factor for a directory should apply to new files too
Mala Chikka Kempanna created HDFS-8436: -- Summary: Changing the replication factor for a directory should apply to new files too Key: HDFS-8436 URL: https://issues.apache.org/jira/browse/HDFS-8436 Project: Hadoop HDFS Issue Type: Improvement Reporter: Mala Chikka Kempanna Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor (dfs.replication from hdfs-site.xml) of the cluster. I would expect new files written under a directory to have the same replication factor set for the directory itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8436) Changing the replication factor for a directory should apply to new files under the directory too
[ https://issues.apache.org/jira/browse/HDFS-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mala Chikka Kempanna updated HDFS-8436: --- Summary: Changing the replication factor for a directory should apply to new files under the directory too (was: Changing the replication factor for a directory should apply to new files too) Changing the replication factor for a directory should apply to new files under the directory too - Key: HDFS-8436 URL: https://issues.apache.org/jira/browse/HDFS-8436 Project: Hadoop HDFS Issue Type: Improvement Reporter: Mala Chikka Kempanna Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor (dfs.replication from hdfs-site.xml) of the cluster. I would expect new files written under a directory to have the same replication factor set for the directory itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8186) Erasure coding: Make block placement policy for EC file configurable
[ https://issues.apache.org/jira/browse/HDFS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551617#comment-14551617 ] Walter Su commented on HDFS-8186: - Thanks Zhe Zhang. Jenkins didn't come out. I am trying re-trigger it. Erasure coding: Make block placement policy for EC file configurable Key: HDFS-8186 URL: https://issues.apache.org/jira/browse/HDFS-8186 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8186-HDFS-7285.003.patch This includes: 1. User can config block placement policy for EC file in xml configuration file. 2. EC policy works for EC file, replication policy works for non-EC file. They are coexistent. Not includes: 1. Details of block placement policy for EC. Discussion and implementation goes to HDFS-7613. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8441) Erasure Coding: make condition check earlier for setReplication
Walter Su created HDFS-8441: --- Summary: Erasure Coding: make condition check earlier for setReplication Key: HDFS-8441 URL: https://issues.apache.org/jira/browse/HDFS-8441 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Changes: 1. {{UnsupportedActionException}} is more user-firendly. 2. check condition before update quota count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8441) Erasure Coding: make condition check earlier for setReplication
[ https://issues.apache.org/jira/browse/HDFS-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8441: Attachment: HDFS-8441-HDFS-7285.001.patch Erasure Coding: make condition check earlier for setReplication --- Key: HDFS-8441 URL: https://issues.apache.org/jira/browse/HDFS-8441 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8441-HDFS-7285.001.patch Changes: 1. {{UnsupportedActionException}} is more user-firendly. 2. check condition before update quota count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8441) Erasure Coding: make condition check earlier for setReplication
[ https://issues.apache.org/jira/browse/HDFS-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8441: Status: Patch Available (was: Open) Erasure Coding: make condition check earlier for setReplication --- Key: HDFS-8441 URL: https://issues.apache.org/jira/browse/HDFS-8441 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8441-HDFS-7285.001.patch Changes: 1. {{UnsupportedActionException}} is more user-firendly. 2. check condition before update quota count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7984) webhdfs:// needs to support provided delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551787#comment-14551787 ] Allen Wittenauer edited comment on HDFS-7984 at 5/20/15 4:24 AM: - Yup, [~erwaman]. That's exactly what this issue is about... Keep in mind that one might need more than one token... :) was (Author: aw): Yup, [~erwaman]. That's exactly what this issue is about... Keep in mind that it one might need more than one token... :) webhdfs:// needs to support provided delegation tokens -- Key: HDFS-7984 URL: https://issues.apache.org/jira/browse/HDFS-7984 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Blocker When using the webhdfs:// filesystem (especially from distcp), we need the ability to inject a delegation token rather than webhdfs initialize its own. This would allow for cross-authentication-zone file system accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551787#comment-14551787 ] Allen Wittenauer commented on HDFS-7984: Yup, [~erwaman]. That's exactly what this issue is about... Keep in mind that it one might need more than one token... :) webhdfs:// needs to support provided delegation tokens -- Key: HDFS-7984 URL: https://issues.apache.org/jira/browse/HDFS-7984 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Blocker When using the webhdfs:// filesystem (especially from distcp), we need the ability to inject a delegation token rather than webhdfs initialize its own. This would allow for cross-authentication-zone file system accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8210) Ozone: Implement storage container manager
[ https://issues.apache.org/jira/browse/HDFS-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551861#comment-14551861 ] Hadoop QA commented on HDFS-8210: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch HDFS-7240 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 20s | The applied patch generated 11 new checkstyle issues (total was 305, now 312). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 5s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 167m 59s | Tests failed in hadoop-hdfs. | | | | 211m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.tools.TestHdfsConfigFields | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734040/HDFS-8210-HDFS-7240.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 15ccd96 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11055/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11055/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11055/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11055/console | This message was automatically generated. Ozone: Implement storage container manager --- Key: HDFS-8210 URL: https://issues.apache.org/jira/browse/HDFS-8210 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HDFS-8210-HDFS-7240.1.patch, HDFS-8210-HDFS-7240.2.patch, HDFS-8210-HDFS-7240.3.patch The storage container manager collects datanode heartbeats, manages replication and exposes API to lookup containers. This jira implements storage container manager by re-using the block manager implementation in namenode. This jira provides initial implementation that works with datanodes. The additional protocols will be added in subsequent jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8382) Remove chunkSize parameter from initialize method of raw erasure coder
[ https://issues.apache.org/jira/browse/HDFS-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551439#comment-14551439 ] Kai Zheng commented on HDFS-8382: - Thanks for the comment, Nicholas. While having the initialize method makes somethings easy I agree it's better to remove it. Previously I was thinking there are something heavy to be done not appropriate in constructor, but for now when I have finished the native coders, I agree the initialization work can also be done well in constructor. Will remove the initialize method as well in following updated patch. Remove chunkSize parameter from initialize method of raw erasure coder -- Key: HDFS-8382 URL: https://issues.apache.org/jira/browse/HDFS-8382 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8382-HDFS-7285-v1.patch, HDFS-8382-HDFS-7285-v2.patch Per discussion in HDFS-8347, we need to support encoding/decoding variable width units data instead of predefined fixed width like {{chunkSize}}. Have this issue to remove chunkSize in the general raw erasure coder API. Specific coder will support fixed chunkSize using hard-coded or specific schema customizing way if necessary, like HitchHiker coder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8438) Erasure Coding: support concat files in same EC zone
Walter Su created HDFS-8438: --- Summary: Erasure Coding: support concat files in same EC zone Key: HDFS-8438 URL: https://issues.apache.org/jira/browse/HDFS-8438 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8294) Erasure Coding: Fix Findbug warnings present in erasure coding
[ https://issues.apache.org/jira/browse/HDFS-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551770#comment-14551770 ] Rakesh R commented on HDFS-8294: Now, we have a way to trigger jenkins on our EC branch. In that case, I think we could cleanup all the Findbug warnings reported as of now. Later for each jira checkins/commit we could trigger the jenkins and observe a +1 Findbug QA report. IMHO it is not good to build new logic on top of a code which has Findbug warnings. Also, the patch rebasing effort will be less now. Does this make sense to you ? Erasure Coding: Fix Findbug warnings present in erasure coding -- Key: HDFS-8294 URL: https://issues.apache.org/jira/browse/HDFS-8294 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: FindBugs Report in EC feature.html, HDFS-8294-HDFS-7285.00.patch, HDFS-8294-HDFS-7285.01.patch, HDFS-8294-HDFS-7285.02.patch, HDFS-8294-HDFS-7285.03.patch, HDFS-8294-HDFS-7285.04.patch, HDFS-8294-HDFS-7285.05.patch This jira is to address the findbug issues reported in erasure coding feature. Attached sheet which contains the details of the findbug issues reported in the erasure coding feature. I've taken this report from the jenkins build : https://builds.apache.org/job/PreCommit-HDFS-Build/10848/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8186) Erasure coding: Make block placement policy for EC file configurable
[ https://issues.apache.org/jira/browse/HDFS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551774#comment-14551774 ] Hadoop QA commented on HDFS-8186: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 11s | The patch appears to introduce 6 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 116m 58s | Tests failed in hadoop-hdfs. | | | | 158m 28s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 194] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 254] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 907-914] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 107] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.getStartOffsetsForInternalBlocks(ECSchema, int, LocatedStripedBlock, long) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.getStartOffsetsForInternalBlocks(ECSchema, int, LocatedStripedBlock, long) At StripedBlockUtil.java:[line 403] | | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockInfo | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.namenode.TestAuditLogs | | Timed out tests | org.apache.hadoop.hdfs.server.mover.TestMover | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734028/HDFS-8186-HDFS-7285.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / bf3c28a | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/artifact/patchprocess/patchReleaseAuditProblems.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11054/console | This message was automatically generated. Erasure coding: Make block placement policy for EC file configurable
[jira] [Commented] (HDFS-8432) Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases.
[ https://issues.apache.org/jira/browse/HDFS-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551443#comment-14551443 ] Chris Nauroth commented on HDFS-8432: - The test failures look unrelated and do not repro locally. They look like the problem that was just reported in HDFS-8434. Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases. --- Key: HDFS-8432 URL: https://issues.apache.org/jira/browse/HDFS-8432 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, rolling upgrades Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-8432-HDFS-Downgrade-Extended-Support.pdf, HDFS-8432.001.patch Maintain the prior layout version during the upgrade window and reject attempts to use new features until after the upgrade has been finalized. This guarantees that the prior software version can read the fsimage and edit logs if the administrator decides to downgrade. This will make downgrade usable for the majority of NameNode layout version changes, which just involve introduction of new edit log operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth moved HADOOP-12003 to HDFS-8435: -- Component/s: (was: fs) webhdfs Issue Type: Improvement (was: Bug) Key: HDFS-8435 (was: HADOOP-12003) Project: Hadoop HDFS (was: Hadoop Common) createNonRecursive support needed in WebHdfsFileSystem to support HBase --- Key: HDFS-8435 URL: https://issues.apache.org/jira/browse/HDFS-8435 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Reporter: Vinoth Sathappan The WebHdfsFileSystem implementation doesn't support createNonRecursive. HBase extensively depends on that for proper functioning. Currently, when the region servers are started over web hdfs, they crash due with - createNonRecursive unsupported for this filesystem class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8437) Fail/warn if HDFS is setup with an even number of QJMs.
Tsz Wo Nicholas Sze created HDFS-8437: - Summary: Fail/warn if HDFS is setup with an even number of QJMs. Key: HDFS-8437 URL: https://issues.apache.org/jira/browse/HDFS-8437 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor When setting an even number (2n, n1) of QJMs, the number of failure it can tolerate is the same as one node less (2n-1). Therefore, it does not make sense to setup with an even number of QJMs. We should either fail it or warn the users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8439) Adding more slow action log in critical read path
[ https://issues.apache.org/jira/browse/HDFS-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551763#comment-14551763 ] Liu Shaohui commented on HDFS-8439: --- Same work in write path Adding more slow action log in critical read path - Key: HDFS-8439 URL: https://issues.apache.org/jira/browse/HDFS-8439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor To dig a HBase read spike issue, we'd better to add more slow pread/seek log in read flow to get the abnormal datanodes. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8439) Adding more slow action log in critical read path
Liu Shaohui created HDFS-8439: - Summary: Adding more slow action log in critical read path Key: HDFS-8439 URL: https://issues.apache.org/jira/browse/HDFS-8439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor To dig a HBase read spike issue, we'd better to add more slow pread/seek log in read flow to get the abnormal datanodes. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551580#comment-14551580 ] Nate Edel commented on HDFS-8078: - [~cmccabe] - I'm strongly in agreement that it would be preferable to refactor these to pass around parsed objects rather than strings, and based on conversation with [~eclark] he seemed to agree. On the other hand, that's a good deal more of an invasive change than this point fix to unblock testing on the basic client functionality over IPv6. This certainly wasn't intended as complete IPv6 support, just a very small first step towards it that would let us start looking for more subtle bugs. And yes, there's a lot of suspicious spots in the code like the other one you identified; what's surprising to me is that we can deploy to a small cluster and run the HBase IntegrationTestBigLinkedList successfully on an IPv6-only cluster with just this minimal patch (and manually removing the preferIPv4Stack from the various environment variable settings.) Other folks would have a better sense of at what point IPv6 support will need a feature branch. HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: BB2015-05-TBR, ipv6 Attachments: HDFS-8078.9.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8436) Changing the replication factor for a directory should apply to new files under the directory too
[ https://issues.apache.org/jira/browse/HDFS-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-8436. Resolution: Won't Fix Closing as won't fix. This is working as designed. Directories don't have a replication to set, so when you set the replication factor on one, you are actually setting it on the files in that directory. Changing the replication factor for a directory should apply to new files under the directory too - Key: HDFS-8436 URL: https://issues.apache.org/jira/browse/HDFS-8436 Project: Hadoop HDFS Issue Type: Improvement Reporter: Mala Chikka Kempanna Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor (dfs.replication from hdfs-site.xml) of the cluster. I would expect new files written under a directory to have the same replication factor set for the directory itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8135) Remove the deprecated FSConstants class
[ https://issues.apache.org/jira/browse/HDFS-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551454#comment-14551454 ] Andrew Wang commented on HDFS-8135: --- Seems like option 2 is good? Removing deprecated code outside of a new major release is rather unfriendly. Remove the deprecated FSConstants class --- Key: HDFS-8135 URL: https://issues.apache.org/jira/browse/HDFS-8135 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Li Lu Fix For: 2.8.0 Attachments: HDFS-8135-041315.patch The {{FSConstants}} class has been marked as deprecated since 0.23. There is no uses of this class in the current code base and it can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8210) Ozone: Implement storage container manager
[ https://issues.apache.org/jira/browse/HDFS-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-8210: --- Attachment: HDFS-8210-HDFS-7240.3.patch Ozone: Implement storage container manager --- Key: HDFS-8210 URL: https://issues.apache.org/jira/browse/HDFS-8210 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HDFS-8210-HDFS-7240.1.patch, HDFS-8210-HDFS-7240.2.patch, HDFS-8210-HDFS-7240.3.patch The storage container manager collects datanode heartbeats, manages replication and exposes API to lookup containers. This jira implements storage container manager by re-using the block manager implementation in namenode. This jira provides initial implementation that works with datanodes. The additional protocols will be added in subsequent jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)