[jira] [Updated] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8325: --- Summary: Misspelling of threshold in log4j.properties for tests (was: Misspelling of threshold in log4j.properties for tests in hadoop-hdfs) Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8219) setStoragePolicy with folder behavior is different after cluster restart
[ https://issues.apache.org/jira/browse/HDFS-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore updated HDFS-8219: - Labels: BB2015-05-TBR (was: ) setStoragePolicy with folder behavior is different after cluster restart Key: HDFS-8219 URL: https://issues.apache.org/jira/browse/HDFS-8219 Project: Hadoop HDFS Issue Type: Bug Reporter: Peter Shi Assignee: surendra singh lilhore Labels: BB2015-05-TBR Attachments: HDFS-8219.patch, HDFS-8219.unittest-norepro.patch Reproduce steps. 1) mkdir named /temp 2) put one file A under /temp 3) change /temp storage policy to COLD 4) use -getStoragePolicy to query file A's storage policy, it is same with /temp 5) change /temp folder storage policy again, will see file A's storage policy keep same with parent folder. then restart the cluster. do 3) 4) again, will find file A's storage policy is not change while parent folder's storage policy changes. It behaves different. As i debugged, found the code: in INodeFile.getStoragePolicyID {code} public byte getStoragePolicyID() { byte id = getLocalStoragePolicyID(); if (id == BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) { return this.getParent() != null ? this.getParent().getStoragePolicyID() : id; } return id; } {code} If the file do not have its storage policy, it will use parent's. But after cluster restart, the file turns to have its own storage policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2484) checkLease should throw FileNotFoundException when file does not exist
[ https://issues.apache.org/jira/browse/HDFS-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528351#comment-14528351 ] Rakesh R commented on HDFS-2484: It seems the test case failure {{hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate}} is not related to my patch. checkLease should throw FileNotFoundException when file does not exist -- Key: HDFS-2484 URL: https://issues.apache.org/jira/browse/HDFS-2484 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.22.0, 2.0.0-alpha Reporter: Konstantin Shvachko Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-2484.00.patch, HDFS-2484.01.patch When file is deleted during its creation {{FSNamesystem.checkLease(String src, String holder)}} throws {{LeaseExpiredException}}. It would be more informative if it thrown {{FileNotFoundException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8325: --- Component/s: test Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.
[ https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528325#comment-14528325 ] Hudson commented on HDFS-8290: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) HDFS-8290. WebHDFS calls before namesystem initialization can cause NullPointerException. Contributed by Chris Nauroth. (cnauroth: rev c4578760b67d5b5169949a1b059f4472a268ff1b) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/web/resources/TestWebHdfsDataLocality.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt WebHDFS calls before namesystem initialization can cause NullPointerException. -- Key: HDFS-8290 URL: https://issues.apache.org/jira/browse/HDFS-8290 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8290.001.patch The NameNode has a brief window of time when the HTTP server has been initialized, but the namesystem has not been initialized. During this window, a WebHDFS call can cause a {{NullPointerException}}. We can catch this condition and return a more meaningful error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8237) Move all protocol classes used by ClientProtocol to hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-8237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528327#comment-14528327 ] Hudson commented on HDFS-8237: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) HDFS-8237. Move all protocol classes used by ClientProtocol to hdfs-client. Contributed by Haohui Mai. (wheat9: rev 0d6aa5d60948a7966da0ca1c3344a37c1d32f2e9) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NotReplicatedYetException.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveInfo.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolStats.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorageReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolInfo.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSelector.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveInfo.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolEntry.java * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolEntry.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorageReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveEntry.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSelector.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveEntry.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolStats.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/NotReplicatedYetException.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java Move all protocol classes used by ClientProtocol to hdfs-client --- Key: HDFS-8237 URL: https://issues.apache.org/jira/browse/HDFS-8237 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.8.0 Attachments: HDFS-8237.000.patch, HDFS-8237.001.patch, HDFS-8237.002.patch This jira proposes to move the classes in the hdfs project referred by ClientProtocol into the hdfs-client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528324#comment-14528324 ] Hudson commented on HDFS-7916: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 318081ccd7af1ec02ec18f35ea95c579326be728) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReportBadBlockAction.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528331#comment-14528331 ] Hudson commented on HDFS-7397: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) HDFS-7397. Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size (Brahma Reddy Battula via Colin P. McCabe) (cmccabe: rev 3fe79e1db84391cb17dbed6b579fe9c803b3d1c2) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size --- Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6300) Shouldn't allows to run multiple balancer simultaneously
[ https://issues.apache.org/jira/browse/HDFS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528346#comment-14528346 ] Rakesh R commented on HDFS-6300: It looks like {{hadoop.hdfs.server.namenode.TestFileTruncate}} is not related to my patch. Shouldn't allows to run multiple balancer simultaneously Key: HDFS-6300 URL: https://issues.apache.org/jira/browse/HDFS-6300 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-6300-001.patch, HDFS-6300.patch Javadoc of Balancer.java says, it will not allow to run second balancer if the first one is in progress. But I've noticed multiple can run together and balancer.id implementation is not safe guarding. {code} * liAnother balancer is running. Exiting... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8324) Add trace info to DFSClient#getErasureCodingZoneInfo(..)
[ https://issues.apache.org/jira/browse/HDFS-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528446#comment-14528446 ] Uma Maheswara Rao G commented on HDFS-8324: --- +1 Add trace info to DFSClient#getErasureCodingZoneInfo(..) Key: HDFS-8324 URL: https://issues.apache.org/jira/browse/HDFS-8324 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-8324-HDFS-7285.01.patch Add trace spans to DFSClient#getErasureCodingZoneInfo(..) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528290#comment-14528290 ] Hudson commented on HDFS-7916: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #184 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/184/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 318081ccd7af1ec02ec18f35ea95c579326be728) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReportBadBlockAction.java 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528403#comment-14528403 ] Hadoop QA commented on HDFS-7847: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 1m 20s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 166m 19s | Tests failed in hadoop-hdfs. | | | | 188m 5s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730467/HDFS-7847.005.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 318081c | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10809/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10809/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10809/console | This message was automatically generated. Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, HDFS-7847.005.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528406#comment-14528406 ] Yi Liu commented on HDFS-8203: -- Thanks Jing for the comment, you are right, it will be more simpler, I will update the patch with test later. Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8325) Misspelling of threshold in log4j.properties for tests in hadoop-hdfs
Brahma Reddy Battula created HDFS-8325: -- Summary: Misspelling of threshold in log4j.properties for tests in hadoop-hdfs Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8314) Move HdfsServerConstants#IO_FILE_BUFFER_SIZE and SMALL_BUFFER_SIZE to the users
[ https://issues.apache.org/jira/browse/HDFS-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529503#comment-14529503 ] Hudson commented on HDFS-8314: -- FAILURE: Integrated in Hadoop-trunk-Commit #7742 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7742/]) HDFS-8314. Move HdfsServerConstants#IO_FILE_BUFFER_SIZE and SMALL_BUFFER_SIZE to the users. Contributed by Li Lu. (wheat9: rev 4da8490b512a33a255ed27309860859388d7c168) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java Move HdfsServerConstants#IO_FILE_BUFFER_SIZE and SMALL_BUFFER_SIZE to the users --- Key: HDFS-8314 URL: https://issues.apache.org/jira/browse/HDFS-8314 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Haohui Mai Assignee: Li Lu Fix For: 2.8.0 Attachments: HDFS-8314-trunk.001.patch, HDFS-8314-trunk.002.patch, HDFS-8314-trunk.003.patch, HDFS-8314-trunk.004.patch Currently HdfsServerConstants reads the configuration and to set the value of IO_FILE_BUFFER_SIZE and SMALL_BUFFER_SIZE, thus they are configurable instead of being constants. This jira proposes to move these two variables to the users in the upper-level so that HdfsServerConstants only stores constant values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8286) Scaling out the namespace using KV store
[ https://issues.apache.org/jira/browse/HDFS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529524#comment-14529524 ] Konstantin Shvachko commented on HDFS-8286: --- Hey guys, I read the design doc, and is wondering _what is the exact goal of this jira?_ From the design and the descriptions it is not quite clear if you propose to rebase single NameNode on LevelDB, by replacing say {{FSDirectory}} with the KV store, or target building a distributed namepsace service. I am asking because I've always been interested in evolving HDFS towards distributing its namepsace in general, and using KV stores for it, in particular. [The Giraffa project|https://github.com/GiraffaFS/giraffa] has been dedicated to this goal for a few years now, [as most of you are probably well aware of|http://www.slideshare.net/Hadoop_Summit/dynamic-namespace-partitioning-with-giraffa-file-system]. Notes on the design document: # You probably want _a support for a more generic notion of a {{Key}}_. Your definition of {{key = parentId, fileName}} is well understood, and was probably first introduced around 1995 in treeFS, the predecessor of reiserFS, the predecessor of Btrfs, with the latter mentioned in your design. It keeps files of the same directory close to each other (locality). But in larger storage systems more flexibility in defining locality may be needed. E.g. of using two-level keys {{ppid, pid, file}}, (which includes the locality of adjacent directories), or three-level keys, or full-path keys as in Ceph. E.g., Giraffa introduces a generic Key interface, which allows different implementations including the one you describe. And your design of KV-implementation of snapshots seems to go along these lines. # _What motivates the choice of levelDB?_ It is a well recognized KV storage library. But it is not a distributed KV-store. So, what is the plan here? In Giraffa the KV store is designed to be pluggable and we currently use HBase implementation. We also considered: levelDB, [mapDB|http://www.mapdb.org], [Redis|https://github.com/GiraffaFS/giraffa/wiki/Redis:-applicability-to-Giraffa], GemFire aka [Apache incubator Geode|https://wiki.apache.org/incubator/GeodeProposal], [Apache incubator Ignite|http://ignite.incubator.apache.org], [Prevayler|http://prevayler.org/], among a few others. # The HA support paragraph talks about a single active NN and a standby NN. It is not clear _what is proposed for a distributed namespace, if anything?_ So, back to the starting question - what is the main goal for the issue? We may find some forms of collaboration between the projects. Scaling out the namespace using KV store Key: HDFS-8286 URL: https://issues.apache.org/jira/browse/HDFS-8286 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: hdfs-kv-design.pdf Currently the NN keeps the namespace in the memory. To improve the scalability of the namespace, users can scale up by using more RAM or scale out using Federation (i.e., statically partitioning the namespace). We would like to remove the limitation of scaling the global namespace. Our vision is that that HDFS should adopt a scalable underlying architecture that allows the global namespace scales linearly. We propose to implement the HDFS namespace on top of a key-value (KV) store. Adopting the KV store interfaces allows HDFS to leverage the capability of modern KV store and to become much easier to scale. Going forward, the architecture allows distributing the namespace across multiple machines, or storing only the working set in the memory (HDFS-5389), both of which allows HDFS to manage billions of files using the commodity hardware available today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529714#comment-14529714 ] Hadoop QA commented on HDFS-8284: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 56s | Site still builds. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | common tests | 22m 47s | Tests passed in hadoop-common. | | {color:green}+1{color} | hdfs tests | 168m 41s | Tests passed in hadoop-hdfs. | | | | 231m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730612/HDFS-8284.003.patch | | Optional Tests | javadoc javac unit site | | git revision | trunk / 0100b15 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10821/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10821/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10821/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10821/console | This message was automatically generated. Add usage of tracing originated in DFSClient to doc --- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, HDFS-8284.003.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: DataNode reconstruct striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529745#comment-14529745 ] Yi Liu commented on HDFS-7348: -- Thanks Zhe for the review and commit! Erasure Coding: DataNode reconstruct striped blocks --- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Fix For: HDFS-7285 Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch, HDFS-7348.003.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8328) Follow-on to update decode for DataNode striped blocks reconstruction
Yi Liu created HDFS-8328: Summary: Follow-on to update decode for DataNode striped blocks reconstruction Key: HDFS-8328 URL: https://issues.apache.org/jira/browse/HDFS-8328 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Current the decode for DataNode striped blocks reconstruction is a workaround, we need to update it after the decode fix in HADOOP-11847. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8321) CacheDirectives and CachePool operations should throw RetriableException in safemode
[ https://issues.apache.org/jira/browse/HDFS-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529817#comment-14529817 ] Hadoop QA commented on HDFS-8321: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 20s | The applied patch generated 1 new checkstyle issues (total was 275, now 275). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 18s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 166m 45s | Tests failed in hadoop-hdfs. | | | | 210m 54s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Redundant nullcheck of StringBuilder.toString(), which is known to be non-null in org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCachePool(CachePoolInfo, boolean) Redundant null check at FSNamesystem.java:is known to be non-null in org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCachePool(CachePoolInfo, boolean) Redundant null check at FSNamesystem.java:[line 7771] | | Failed unit tests | hadoop.hdfs.server.namenode.TestCacheDirectives | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730654/HDFS-8321.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4da8490 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10823/console | This message was automatically generated. CacheDirectives and CachePool operations should throw RetriableException in safemode Key: HDFS-8321 URL: https://issues.apache.org/jira/browse/HDFS-8321 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8321.000.patch, HDFS-8321.001.patch Operations such as {{addCacheDirectives()}} throws {{SafeModeException}} when the NN is in safemode: {code} if (isInSafeMode()) { throw new SafeModeException( Cannot add cache directive, safeMode); } {code} While other NN operations throws {{RetriableException}} when HA is enabled: {code} void checkNameNodeSafeMode(String errorMsg) throws RetriableException, SafeModeException { if (isInSafeMode()) { SafeModeException se = new SafeModeException(errorMsg, safeMode); if (haEnabled haContext != null haContext.getState().getServiceState() == HAServiceState.ACTIVE shouldRetrySafeMode(this.safeMode)) { throw new RetriableException(se); } else { throw se; } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2531) TestDFSClientExcludedNodesTestBlocksScheduledCounter can cause for random failures iin Eclipse.
[ https://issues.apache.org/jira/browse/HDFS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-2531: --- Labels: BB2015-05-TBR (was: ) TestDFSClientExcludedNodesTestBlocksScheduledCounter can cause for random failures iin Eclipse. --- Key: HDFS-2531 URL: https://issues.apache.org/jira/browse/HDFS-2531 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Labels: BB2015-05-TBR Attachments: HDFS-2531.patch FAILED: org.apache.hadoop.hdfs.TestDFSClientExcludedNodes.testExcludedNodes Error Message: Cannot lock storage /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1. The directory is already locked. Stack Trace: java.io.IOException: Cannot lock storage /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1. The directory is already locked. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:586) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:435) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:253) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:169) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:371) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:314) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:298) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:332) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7390) Provide JMX metrics per storage type
[ https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7390: --- Labels: BB2015-05-TBR (was: ) Provide JMX metrics per storage type Key: HDFS-7390 URL: https://issues.apache.org/jira/browse/HDFS-7390 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.5.2 Reporter: Benoy Antony Assignee: Benoy Antony Labels: BB2015-05-TBR Attachments: HDFS-7390.patch, HDFS-7390.patch HDFS-2832 added heterogeneous support. In a cluster with different storage types, it is useful to have metrics per storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7592) A bug in BlocksMap that cause NameNode memory leak.
[ https://issues.apache.org/jira/browse/HDFS-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7592: --- Labels: BB2015-05-TBR BlocksMap leak memory (was: BlocksMap leak memory) A bug in BlocksMap that cause NameNode memory leak. - Key: HDFS-7592 URL: https://issues.apache.org/jira/browse/HDFS-7592 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.21.0 Environment: HDFS-0.21.0 Reporter: JichengSong Assignee: JichengSong Labels: BB2015-05-TBR, BlocksMap, leak, memory Attachments: HDFS-7592.patch In our HDFS production environment, NameNode FGC frequently after running for 2 months, we have to restart NameNode manually. We dumped NameNode's Heap for objects statistics. Before restarting NameNode: num #instances #bytes class name -- 1: 59262275 3613989480 [Ljava.lang.Object; ... 10: 8549361 615553992 org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction 11: 5941511 427788792 org.apache.hadoop.hdfs.server.namenode.INodeFileUnderConstruction After restarting NameNode: num #instances #bytes class name -- 1: 44188391 2934099616 [Ljava.lang.Object; ... 23: 721763 51966936 org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction 24: 620028 44642016 org.apache.hadoop.hdfs.server.namenode.INodeFileUnderConstruction We find the number of BlockInfoUnderConstruction is abnormally large before restarting NameNode. As we know, BlockInfoUnderConstruction keeps block state when the file is being written. But the write pressure of our cluster is far less than million/sec. We think there is a memory leak in NameNode. We fixed the bug as followsing patch. diff --git a/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java b/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java index 7a40522..857d340 100644 --- a/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java +++ b/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java @@ -205,6 +205,8 @@ class BlocksMap { DatanodeDescriptor dn = currentBlock.getDatanode(idx); dn.replaceBlock(currentBlock, newBlock); } +// change to fix bug about memory leak of NameNode +map.remove(newBlock); // replace block in the map itself map.put(newBlock, newBlock); return newBlock; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2847) NamenodeProtocol#getBlocks() should use DatanodeID as an argument instead of DatanodeInfo
[ https://issues.apache.org/jira/browse/HDFS-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-2847: --- Labels: BB2015-05-TBR (was: ) NamenodeProtocol#getBlocks() should use DatanodeID as an argument instead of DatanodeInfo - Key: HDFS-2847 URL: https://issues.apache.org/jira/browse/HDFS-2847 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Suresh Srinivas Labels: BB2015-05-TBR Attachments: HDFS-2847.txt, HDFS-2847.txt, HDFS-2847.txt DatanodeID is sufficient for identifying a Datanode. DatanodeInfo has a lot of information that is not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2712) setTimes should support only for files and move the atime field down to iNodeFile.
[ https://issues.apache.org/jira/browse/HDFS-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-2712: --- Labels: BB2015-05-TBR (was: ) setTimes should support only for files and move the atime field down to iNodeFile. -- Key: HDFS-2712 URL: https://issues.apache.org/jira/browse/HDFS-2712 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0, 2.0.0-alpha Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Labels: BB2015-05-TBR Attachments: HDFS-2712.patch After the dicussion in HDFS-2436, unsupported behaviour for setTimes was intentional (HADOOP-1869). But current INode structure hierarchy seems, it supports atime for directories also. But as per HADOOP-1869, we are supporting only for files. To avoid the confusions, we can move the atime fields to INodeFile as we planned to support setTimes only for files. And also restrict the support for setTimes on directories ( which is implemented with HDFS-2436 ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk
[ https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4114: --- Labels: BB2015-05-TBR (was: ) Remove the BackupNode and CheckpointNode from trunk --- Key: HDFS-4114 URL: https://issues.apache.org/jira/browse/HDFS-4114 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Tsz Wo Nicholas Sze Labels: BB2015-05-TBR Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch, h4114_20150210.patch Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and CheckpointNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4929) [NNBench mark] Lease mismatch error when running with multiple mappers
[ https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4929: --- Labels: BB2015-05-TBR (was: ) [NNBench mark] Lease mismatch error when running with multiple mappers -- Key: HDFS-4929 URL: https://issues.apache.org/jira/browse/HDFS-4929 Project: Hadoop HDFS Issue Type: Bug Components: benchmarks Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Labels: BB2015-05-TBR Attachments: HDFS4929.patch Command : ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` -replicationFactorPerFile 3 -maps 100 -reduces 10 Trace : 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.105.214:36320: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7784) load fsimage in parallel
[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7784: --- Labels: BB2015-05-TBR (was: ) load fsimage in parallel Key: HDFS-7784 URL: https://issues.apache.org/jira/browse/HDFS-7784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Walter Su Assignee: Walter Su Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-7784.001.patch, test-20150213.pdf When single Namenode has huge amount of files, without using federation, the startup/restart speed is slow. The fsimage loading step takes the most of the time. fsimage loading can seperate to two parts, deserialization and object construction(mostly map insertion). Deserialization takes the most of CPU time. So we can do deserialization in parallel, and add to hashmap in serial. It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7868: --- Labels: BB2015-05-TBR (was: ) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Labels: BB2015-05-TBR Attachments: HDFS-7868-001.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5263) Delegation token is not created generateNodeDataHeader method of NamenodeJspHelper$NodeListJsp
[ https://issues.apache.org/jira/browse/HDFS-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5263: --- Labels: BB2015-05-TBR (was: ) Delegation token is not created generateNodeDataHeader method of NamenodeJspHelper$NodeListJsp -- Key: HDFS-5263 URL: https://issues.apache.org/jira/browse/HDFS-5263 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Vasu Mariyala Labels: BB2015-05-TBR Attachments: HDFS-5263-rev1.patch, HDFS-5263.patch When Kerberos authentication is enabled, we are unable to browse to the data nodes using ( Name node web page -- Live Nodes -- Select any of the data nodes). The reason behind this is the delegation token is not provided as part of the url in the method (generateNodeDataHeader method of NodeListJsp) {code} String url = HttpConfig.getSchemePrefix() + d.getHostName() + : + d.getInfoPort() + /browseDirectory.jsp?namenodeInfoPort= + nnHttpPort + dir= + URLEncoder.encode(/, UTF-8) + JspHelper.getUrlParam(JspHelper.NAMENODE_ADDRESS, nnaddr); {code} But browsing the file system using name node web page -- Browse the file system - any directory is working fine as the redirectToRandomDataNode method of NamenodeJspHelper creates the delegation token {code} redirectLocation = HttpConfig.getSchemePrefix() + fqdn + : + redirectPort + /browseDirectory.jsp?namenodeInfoPort= + nn.getHttpAddress().getPort() + dir=/ + (tokenString == null ? : JspHelper.getDelegationTokenUrlParam(tokenString)) + JspHelper.getUrlParam(JspHelper.NAMENODE_ADDRESS, addr); {code} I will work on providing a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4387) libhdfs doesn't work with jamVM
[ https://issues.apache.org/jira/browse/HDFS-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4387: --- Labels: BB2015-05-TBR (was: ) libhdfs doesn't work with jamVM --- Key: HDFS-4387 URL: https://issues.apache.org/jira/browse/HDFS-4387 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 3.0.0 Reporter: Andy Isaacson Priority: Minor Labels: BB2015-05-TBR Attachments: 01.patch Building and running tests on OpenJDK 7 on Ubuntu 12.10 fails with {{mvn test -Pnative}}. The output is hard to decipher but the underlying issue is that {{test_libhdfs_native}} segfaults at startup. {noformat} (gdb) run Starting program: /mnt/trunk/hadoop-hdfs-project/hadoop-hdfs/target/native/test_libhdfs_threaded [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. Program received signal SIGSEGV, Segmentation fault. 0x7739a897 in attachJNIThread (name=0x0, is_daemon=is_daemon@entry=0 '\000', group=0x0) at thread.c:768 768 thread.c: No such file or directory. (gdb) where #0 0x7739a897 in attachJNIThread (name=0x0, is_daemon=is_daemon@entry=0 '\000', group=0x0) at thread.c:768 #1 0x77395020 in attachCurrentThread (is_daemon=0, args=0x0, penv=0x7fffddb8) at jni.c:1454 #2 Jam_AttachCurrentThread (vm=optimized out, penv=0x7fffddb8, args=0x0) at jni.c:1466 #3 0x77bcf979 in getGlobalJNIEnv () at /mnt/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:527 #4 getJNIEnv () at /mnt/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:585 #5 0x00402512 in nmdCreate (conf=conf@entry=0x7fffdeb0) at /mnt/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c:49 #6 0x004016e1 in main () at /mnt/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c:283 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4977) Change Checkpoint Size of web ui of SecondaryNameNode
[ https://issues.apache.org/jira/browse/HDFS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4977: --- Labels: BB2015-05-TBR newbie (was: newbie) Change Checkpoint Size of web ui of SecondaryNameNode --- Key: HDFS-4977 URL: https://issues.apache.org/jira/browse/HDFS-4977 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.0.4-alpha Reporter: Shinichi Yamashita Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: HDFS-4977-2.patch, HDFS-4977.patch, HDFS-4977.patch The checkpoint of SecondaryNameNode after 2.0 is carried out by dfs.namenode.checkpoint.period and dfs.namenode.checkpoint.txns. Because Checkpoint Size displayed in status.jsp of SecondaryNameNode, it shuold make modifications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4812) add hdfsReadFully, hdfsWriteFully
[ https://issues.apache.org/jira/browse/HDFS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4812: --- Labels: BB2015-05-TBR (was: ) add hdfsReadFully, hdfsWriteFully - Key: HDFS-4812 URL: https://issues.apache.org/jira/browse/HDFS-4812 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-4812.001.patch It would be nice to have {{hdfsReadFully}} and {{hdfsWriteFully}} in libhdfs. The current APIs don't guarantee that we read or write as much as we're told to do. We have readFully and writeFully in Java, but not in libhdfs at the moment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5361) Change the unit of StartupProgress 'PercentComplete' to percentage
[ https://issues.apache.org/jira/browse/HDFS-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5361: --- Labels: BB2015-05-TBR metrics newbie (was: metrics newbie) Change the unit of StartupProgress 'PercentComplete' to percentage -- Key: HDFS-5361 URL: https://issues.apache.org/jira/browse/HDFS-5361 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: BB2015-05-TBR, metrics, newbie Attachments: HDFS-5361.2.patch, HDFS-5361.3.patch, HDFS-5361.patch Now the unit of 'PercentComplete' metrics is rate (maximum is 1.0). It's confusing for users because its name includes percent. The metrics should be multiplied by 100. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4160) libhdfs / fuse-dfs should implement O_CREAT | O_EXCL
[ https://issues.apache.org/jira/browse/HDFS-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4160: --- Labels: BB2015-05-TBR (was: ) libhdfs / fuse-dfs should implement O_CREAT | O_EXCL Key: HDFS-4160 URL: https://issues.apache.org/jira/browse/HDFS-4160 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-4160.001.patch {{hdfsOpenFile}} contains this code: {code} if ((flags O_CREAT) (flags O_EXCL)) { fprintf(stderr, WARN: hdfs does not truly support O_CREATE O_EXCL\n); } {code} But {{hdfsOpenFile}} could easily support *O_CREAT* | *O_EXCL* by calling {{FileSystem#create}} with {{overwrite = false}}. We should do this. It would also benefit {{fuse-dfs}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5384) Add a new TransitionState to indicate NN is in transition from standby state to active state
[ https://issues.apache.org/jira/browse/HDFS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5384: --- Labels: BB2015-05-TBR (was: ) Add a new TransitionState to indicate NN is in transition from standby state to active state Key: HDFS-5384 URL: https://issues.apache.org/jira/browse/HDFS-5384 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Labels: BB2015-05-TBR Attachments: HDFS-5384.000.patch, HDFS-5384.001.patch Currently in HA setup, when a NameNode is transitioning from standby to active, the current code first sets the state of the NN to Active, then starts the active service, during which the NN still needs to tail the remaining editlog and may not be able to serve certain requests as expected (such as HDFS-5322). So it may be necessary to define a transition state to indicate that NN has left the previous state and is in transitioning to the next state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3512) Delay in scanning blocks at DN side when there are huge number of blocks
[ https://issues.apache.org/jira/browse/HDFS-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3512: --- Labels: BB2015-05-TBR (was: ) Delay in scanning blocks at DN side when there are huge number of blocks Key: HDFS-3512 URL: https://issues.apache.org/jira/browse/HDFS-3512 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Reporter: suja s Assignee: amith Labels: BB2015-05-TBR Attachments: HDFS-3512.patch Block scanner maintains the full list of blocks at DN side in a map and there is no differentiation between the blocks which are already scanned and the ones not scanend. For every check (ie every 5 secs) it will pick one block and scan. There are chances that it chooses a block which is already scanned which leads to further delay in scanning of blcoks which are yet to be scanned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5066) Inode tree with snapshot information visualization
[ https://issues.apache.org/jira/browse/HDFS-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5066: --- Labels: BB2015-05-TBR (was: ) Inode tree with snapshot information visualization --- Key: HDFS-5066 URL: https://issues.apache.org/jira/browse/HDFS-5066 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-5066.v1.patch, HDFS-5066.v2.patch, HDFS-5066.v3.patch, visnap.png It would be nice to be able to visualize snapshot information, in order to ease the understanding of related data structures. We can generate graph from in memory inode links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3627) OfflineImageViewer oiv Indented processor prints out the Java class name in the DELEGATION_KEY field
[ https://issues.apache.org/jira/browse/HDFS-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3627: --- Labels: BB2015-05-TBR (was: ) OfflineImageViewer oiv Indented processor prints out the Java class name in the DELEGATION_KEY field Key: HDFS-3627 URL: https://issues.apache.org/jira/browse/HDFS-3627 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Ravi Prakash Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-3627.patch, HDFS-3627.patch, HDFS-3627.patch, HDFS-3627.patch, HDFS-3627.patch, HDFS-3627.patch Instead of the contents of the delegation key this is printed out DELEGATION_KEY = org.apache.hadoop.security.token.delegation.DelegationKey@1e2ca7 DELEGATION_KEY = org.apache.hadoop.security.token.delegation.DelegationKey@105bd58 DELEGATION_KEY = org.apache.hadoop.security.token.delegation.DelegationKey@1d1e730 DELEGATION_KEY = org.apache.hadoop.security.token.delegation.DelegationKey@1a116c9 DELEGATION_KEY = org.apache.hadoop.security.token.delegation.DelegationKey@df1832 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5040) Audit log for admin commands/ logging output of all DFS admin commands
[ https://issues.apache.org/jira/browse/HDFS-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5040: --- Labels: BB2015-05-TBR (was: ) Audit log for admin commands/ logging output of all DFS admin commands -- Key: HDFS-5040 URL: https://issues.apache.org/jira/browse/HDFS-5040 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Raghu C Doppalapudi Assignee: Shinichi Yamashita Labels: BB2015-05-TBR Attachments: HDFS-5040.patch, HDFS-5040.patch, HDFS-5040.patch enable audit log for all the admin commands/also provide ability to log all the admin commands in separate log file, at this point all the logging is displayed on the console. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4924) Show NameNode state on dfsclusterhealth page
[ https://issues.apache.org/jira/browse/HDFS-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4924: --- Labels: BB2015-05-TBR (was: ) Show NameNode state on dfsclusterhealth page Key: HDFS-4924 URL: https://issues.apache.org/jira/browse/HDFS-4924 Project: Hadoop HDFS Issue Type: Improvement Components: federation Affects Versions: 2.1.0-beta Reporter: Lohit Vijayarenu Assignee: Lohit Vijayarenu Labels: BB2015-05-TBR Attachments: HDFS-4924.trunk.1.patch dfsclusterhealth.jsp shows summary of multiple namenodes in cluster. With federation combined with HA it becomes difficult to quickly know the state of NameNodes in the cluster. It would be good to show if NameNode is Active/Standy on summary page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5357) TestFileSystemAccessService failures in JDK7
[ https://issues.apache.org/jira/browse/HDFS-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5357: --- Labels: BB2015-05-TBR (was: ) TestFileSystemAccessService failures in JDK7 Key: HDFS-5357 URL: https://issues.apache.org/jira/browse/HDFS-5357 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.9 Reporter: Robert Parker Assignee: Robert Parker Labels: BB2015-05-TBR Attachments: HDFS-5357v1.patch junit.framework.AssertionFailedError: Expected Exception: ServiceException got: ExceptionInInitializerError at junit.framework.Assert.fail(Assert.java:47) at org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:56) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4870) periodically re-resolve hostnames in included and excluded datanodes list
[ https://issues.apache.org/jira/browse/HDFS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4870: --- Labels: BB2015-05-TBR (was: ) periodically re-resolve hostnames in included and excluded datanodes list - Key: HDFS-4870 URL: https://issues.apache.org/jira/browse/HDFS-4870 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-4870.001.patch We currently only resolve the hostnames in the included and excluded datanodes list once-- when the list is read. The rationale for this is that in big clusters, DNS resolution for thousands of nodes can take a long time (when generating a datanode list in getDatanodeListForReport, for example). However, if the DNS information changes for one of these hosts, we should reflect that. A background thread could do these DNS resolutions every few minutes without blocking any foreground operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()
[ https://issues.apache.org/jira/browse/HDFS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4937: --- Labels: BB2015-05-TBR (was: ) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom() -- Key: HDFS-4937 URL: https://issues.apache.org/jira/browse/HDFS-4937 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Kihwal Lee Assignee: Kihwal Lee Labels: BB2015-05-TBR Attachments: HDFS-4937.patch When a large number of nodes are removed by refreshing node lists, the network topology is updated. If the refresh happens at the right moment, the replication monitor thread may stuck in the while loop of {{chooseRandom()}}. This is because the cached cluster size is used in the terminal condition check of the loop. This usually happens when a block with a high replication factor is being processed. Since replicas/rack is also calculated beforehand, no node choice may satisfy the goodness criteria if refreshing removed racks. All nodes will end up in the excluded list, but the size will still be less than the cached cluster size, so it will loop infinitely. This was observed in a production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4777) File creation with overwrite flag set to true results in logSync holding namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4777: --- Labels: BB2015-05-TBR (was: ) File creation with overwrite flag set to true results in logSync holding namesystem lock Key: HDFS-4777 URL: https://issues.apache.org/jira/browse/HDFS-4777 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0, 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Suresh Srinivas Labels: BB2015-05-TBR Attachments: HDFS-4777.patch FSNamesystem#startFileInternal calls delete. Delete method releases the write lock, making parts of startFileInternal code unintentionally executed without write lock being held. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4730) KeyManagerFactory.getInstance supports SunX509 ibmX509 in HsftpFileSystem.java
[ https://issues.apache.org/jira/browse/HDFS-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4730: --- Labels: BB2015-05-TBR patch (was: patch) KeyManagerFactory.getInstance supports SunX509 ibmX509 in HsftpFileSystem.java Key: HDFS-4730 URL: https://issues.apache.org/jira/browse/HDFS-4730 Project: Hadoop HDFS Issue Type: Bug Reporter: Tian Hong Wang Assignee: Tian Hong Wang Labels: BB2015-05-TBR, patch Attachments: HDFS-4730-v1.patch, HDFS-4730_trunk.patch, HDFS-4730_trunk.patch In IBM java, SunX509 should be ibmX509. So use SSLFactory.SSLCERTIFICATE to load dynamically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system
[ https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4837: --- Labels: BB2015-05-TBR (was: ) Allow DFSAdmin to run when HDFS is not the default file system -- Key: HDFS-4837 URL: https://issues.apache.org/jira/browse/HDFS-4837 Project: Hadoop HDFS Issue Type: New Feature Reporter: Mostafa Elhemali Assignee: Mostafa Elhemali Labels: BB2015-05-TBR Attachments: HDFS-4837.patch When Hadoop is running a different default file system than HDFS, but still have HDFS namenode running, we are unable to run dfsadmin commands. I suggest that DFSAdmin use the same mechanism as NameNode does today to get its address: look at dfs.namenode.rpc-address, and if not set fallback on getting it from the default file system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4660) Duplicated checksum on DN in a recovered pipeline
[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4660: --- Labels: BB2015-05-TBR (was: ) Duplicated checksum on DN in a recovered pipeline - Key: HDFS-4660 URL: https://issues.apache.org/jira/browse/HDFS-4660 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Peng Zhang Priority: Critical Labels: BB2015-05-TBR Attachments: HDFS-4660.patch pipeline DN1 DN2 DN3 stop DN2 pipeline added node DN4 located at 2nd position DN1 DN4 DN3 recover RBW DN4 after recover rbw 2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW getNumBytes() = 134144 getBytesOnDisk() = 134144 getVisibleLength()= 134144 end at chunk (134144/512=262) DN3 after recover rbw 2013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW getNumBytes() = 134028 getBytesOnDisk() = 134028 getVisibleLength()= 134028 client send packet after recover pipeline offset=133632 len=1008 DN4 after flush 2013-04-01 21:02:31,779 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1063 // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 1063. DN3 after flush 2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, lastPacketInBlock=false, offsetInBlock=134640, ackEnqueueNanoTime=8817026136871545) 2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing meta file offset of block BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 1055 to 1051 2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1059 After checking meta on DN4, I found checksum of chunk 262 is duplicated, but data not. Later after block was finalized, DN4's scanner detected bad block, and then reported it to NM. NM send a command to delete this block, and replicate this block from other DN in pipeline to satisfy duplication num. I think this is because in BlockReceiver it skips data bytes already written, but not skips checksum bytes already written. And function adjustCrcFilePosition is only used for last non-completed chunk, but not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2843) Rename protobuf message StorageInfoProto to NodeInfoProto
[ https://issues.apache.org/jira/browse/HDFS-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-2843: --- Labels: BB2015-05-TBR (was: ) Rename protobuf message StorageInfoProto to NodeInfoProto - Key: HDFS-2843 URL: https://issues.apache.org/jira/browse/HDFS-2843 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Suresh Srinivas Assignee: Suresh Srinivas Labels: BB2015-05-TBR Attachments: HDFS-2843.patch StorageInfoProto has cTime, layoutVersion, namespaceID and clusterID. This is really information of a node that is part of the cluster, such as Namenode, Standby/Secondary/Backup/Checkpointer and datanodes. To reflect this, I want to rename it as NodeInfoProto from StorageInfoProto. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4325) ClientProtocol.createSymlink parameter dirPerm invalid
[ https://issues.apache.org/jira/browse/HDFS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4325: --- Labels: BB2015-05-TBR (was: ) ClientProtocol.createSymlink parameter dirPerm invalid -- Key: HDFS-4325 URL: https://issues.apache.org/jira/browse/HDFS-4325 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.4-alpha Reporter: Binglin Chang Assignee: Binglin Chang Labels: BB2015-05-TBR Attachments: HDFS-4325.v1.patch {code} * @param link The path of the link being created. * @param dirPerm permissions to use when creating parent directories * @param createParent - if true then missing parent dirs are created * if false then parent must exist {code} According to javadoc, auto created parent dir's permissions will be dirPerm, but in fact directory permissions are always inherit from parent directory plus u+wx. IMHO, createSymlink behavior should be the same as create, which also inherit parent dir permission, so the current behavior makes sense, but the related dirPerm parameters should be removed cause it is invalid and confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3618: --- Labels: BB2015-05-TBR (was: ) SSH fencing option may incorrectly succeed if nc (netcat) command not present - Key: HDFS-3618 URL: https://issues.apache.org/jira/browse/HDFS-3618 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: Vinayakumar B Labels: BB2015-05-TBR Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, zkfc.txt, zkfc_threaddump.out Started NN's and zkfc's in Suse11. Suse11 will have netcat installation and netcat -z will work(but nc -z wn't work).. While executing following command, got command not found hence rc will be other than zero and assuming that server was down..Here we are ending up without checking whether service is down or not.. {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z + serviceAddr.getHostName() + + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn(Unable to fence - it is running but we cannot kill it); return false; } else { LOG.info(Verified that the service is down.); return true; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5639) rpc scheduler abstraction
[ https://issues.apache.org/jira/browse/HDFS-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5639: --- Labels: BB2015-05-TBR (was: ) rpc scheduler abstraction - Key: HDFS-5639 URL: https://issues.apache.org/jira/browse/HDFS-5639 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Labels: BB2015-05-TBR Attachments: HDFS-5639-2.patch, HDFS-5639.patch We have run into various issues in namenode and hbase w.r.t. rpc handling in multi-tenant clusters. The examples are https://issues.apache.org/jira/i#browse/HADOOP-9640 https://issues.apache.org/jira/i#browse/HBASE-8836 There are different ideas on how to prioritize rpc requests. It could be based on user id, or whether it is read request or write request, or it could use specific rule like datanode's RPC is more important than client RPC. We want to enable people to implement and experiiment different rpc schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5517) Lower the default maximum number of blocks per file
[ https://issues.apache.org/jira/browse/HDFS-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5517: --- Labels: BB2015-05-TBR (was: ) Lower the default maximum number of blocks per file --- Key: HDFS-5517 URL: https://issues.apache.org/jira/browse/HDFS-5517 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Labels: BB2015-05-TBR Attachments: HDFS-5517.patch We introduced the maximum number of blocks per file in HDFS-4305, but we set the default to 1MM. In practice this limit is so high as to never be hit, whereas we know that an individual file with 10s of thousands of blocks can cause problems. We should lower the default value, in my opinion to 10k. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5549) Support for implementing custom FsDatasetSpi from outside the project
[ https://issues.apache.org/jira/browse/HDFS-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5549: --- Labels: BB2015-05-TBR (was: ) Support for implementing custom FsDatasetSpi from outside the project - Key: HDFS-5549 URL: https://issues.apache.org/jira/browse/HDFS-5549 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: Ignacio Corderi Labels: BB2015-05-TBR Attachments: HDFS-5549.patch Visibility for multiple methods and a few classes got changed to public to allow FsDatasetSpiT and all the related classes that need subtyping to be fully implemented from outside the HDFS project. Blocks transfers got abstracted to a factory given that the behavior will be changed for DataNodes using Kinetic drives. The existing DataNode to DataNode block transfer functionality got moved to LegacyBlockTransferer, no new configuration is needed to use this class and have the same behavior that is currently present. DataNodes have an additional configuration key DFS_DATANODE_BLOCKTRANSFERER_FACTORY_KEY to override the default block transfer behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8321) CacheDirectives and CachePool operations should throw RetriableException in safemode
[ https://issues.apache.org/jira/browse/HDFS-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8321: --- Labels: BB2015-05-TBR (was: ) CacheDirectives and CachePool operations should throw RetriableException in safemode Key: HDFS-8321 URL: https://issues.apache.org/jira/browse/HDFS-8321 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Labels: BB2015-05-TBR Attachments: HDFS-8321.000.patch, HDFS-8321.001.patch Operations such as {{addCacheDirectives()}} throws {{SafeModeException}} when the NN is in safemode: {code} if (isInSafeMode()) { throw new SafeModeException( Cannot add cache directive, safeMode); } {code} While other NN operations throws {{RetriableException}} when HA is enabled: {code} void checkNameNodeSafeMode(String errorMsg) throws RetriableException, SafeModeException { if (isInSafeMode()) { SafeModeException se = new SafeModeException(errorMsg, safeMode); if (haEnabled haContext != null haContext.getState().getServiceState() == HAServiceState.ACTIVE shouldRetrySafeMode(this.safeMode)) { throw new RetriableException(se); } else { throw se; } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7980: --- Labels: BB2015-05-TBR (was: ) Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: BB2015-05-TBR Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7859: --- Labels: BB2015-05-TBR (was: ) Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Labels: BB2015-05-TBR Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8291) Modify NN WebUI to display correct unit
[ https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8291: --- Labels: BB2015-05-TBR (was: ) Modify NN WebUI to display correct unit Key: HDFS-8291 URL: https://issues.apache.org/jira/browse/HDFS-8291 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhongyi Xie Assignee: Zhongyi Xie Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8291.001.patch, HDFS-8291.002.patch NN Web UI displays its capacity and usage in TB, but it is actually TiB. We should either change the unit name or the calculation to ensure it follows standards. http://en.wikipedia.org/wiki/Tebibyte http://en.wikipedia.org/wiki/Terabyte -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8284: --- Labels: BB2015-05-TBR (was: ) Add usage of tracing originated in DFSClient to doc --- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Labels: BB2015-05-TBR Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, HDFS-8284.003.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8321) CacheDirectives and CachePool operations should throw RetriableException in safemode
[ https://issues.apache.org/jira/browse/HDFS-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8321: - Attachment: HDFS-8321.001.patch CacheDirectives and CachePool operations should throw RetriableException in safemode Key: HDFS-8321 URL: https://issues.apache.org/jira/browse/HDFS-8321 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8321.000.patch, HDFS-8321.001.patch Operations such as {{addCacheDirectives()}} throws {{SafeModeException}} when the NN is in safemode: {code} if (isInSafeMode()) { throw new SafeModeException( Cannot add cache directive, safeMode); } {code} While other NN operations throws {{RetriableException}} when HA is enabled: {code} void checkNameNodeSafeMode(String errorMsg) throws RetriableException, SafeModeException { if (isInSafeMode()) { SafeModeException se = new SafeModeException(errorMsg, safeMode); if (haEnabled haContext != null haContext.getState().getServiceState() == HAServiceState.ACTIVE shouldRetrySafeMode(this.safeMode)) { throw new RetriableException(se); } else { throw se; } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8327) Compute storage type quotas in INodeFile.computeQuotaDeltaForTruncate()
[ https://issues.apache.org/jira/browse/HDFS-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8327: - Attachment: HDFS-8327.001.patch Compute storage type quotas in INodeFile.computeQuotaDeltaForTruncate() --- Key: HDFS-8327 URL: https://issues.apache.org/jira/browse/HDFS-8327 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8327.000.patch, HDFS-8327.001.patch To simplify the code {{INodeFile.computeQuotaDeltaForTruncate()}} can compute the storage type quotas as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8327) Compute storage type quotas in INodeFile.computeQuotaDeltaForTruncate()
[ https://issues.apache.org/jira/browse/HDFS-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8327: - Status: Patch Available (was: Open) Compute storage type quotas in INodeFile.computeQuotaDeltaForTruncate() --- Key: HDFS-8327 URL: https://issues.apache.org/jira/browse/HDFS-8327 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8327.000.patch, HDFS-8327.001.patch To simplify the code {{INodeFile.computeQuotaDeltaForTruncate()}} can compute the storage type quotas as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8203: - Attachment: HDFS-8203.002.patch Update the patch. Current we have the {{curStripedBuf}}, so {{seek}} could be simpler. I update the logic of {{readOneStripe}}. In {{readOneStripe}}, we try to read striped group cells to fill the buffer. But if seek happens, the target pos can be at the middle of striped group cells: # We don't need to fetch all striped group cells, just need to start from the cell contains the target pos. The tests passed in my local env. Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8329) Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion.
Yi Liu created HDFS-8329: Summary: Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion. Key: HDFS-8329 URL: https://issues.apache.org/jira/browse/HDFS-8329 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Both in NN and DN, we use striped block recovery and sometime use reconstruction. The striped block recovery make people confused with block recovery, we should unify them to use striped block reconstruction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6136) PacketReceiver#doRead calls setFieldsFromData with wrong argument
[ https://issues.apache.org/jira/browse/HDFS-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6136: --- Labels: BB2015-05-TBR patch (was: patch) PacketReceiver#doRead calls setFieldsFromData with wrong argument - Key: HDFS-6136 URL: https://issues.apache.org/jira/browse/HDFS-6136 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Nick White Labels: BB2015-05-TBR, patch Attachments: HDFS-6136.patch, HDFS-6136.txt PacketHeader#setFieldsFromData takes the packet length as the first argument, but PacketReceiver#doRead passes the dataPlusChecksumLen (which is 4 less). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8174) Update replication count to live rep count in fsck report
[ https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8174: --- Labels: BB2015-05-TBR (was: ) Update replication count to live rep count in fsck report - Key: HDFS-8174 URL: https://issues.apache.org/jira/browse/HDFS-8174 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8174.1.patch When one of the replica is decommissioned , fetching fsck report gives repl count is one less than the total replica information displayed. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} Update the description from rep to Live_rep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8113: --- Labels: BB2015-05-TBR (was: ) NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.
[ https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-1950: --- Labels: BB2015-05-TBR (was: ) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly. Key: HDFS-1950 URL: https://issues.apache.org/jira/browse/HDFS-1950 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 0.20.205.0 Reporter: ramkrishna.s.vasudevan Assignee: Uma Maheswara Rao G Priority: Blocker Labels: BB2015-05-TBR Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt Before going to the root cause lets see the read behavior for a file having more than 10 blocks in append case.. Logic: There is prefetch size dfs.read.prefetch.size for the DFSInputStream which has default value of 10 This prefetch size is the number of blocks that the client will fetch from the namenode for reading a file.. For example lets assume that a file X having 22 blocks is residing in HDFS The reader first fetches first 10 blocks from the namenode and start reading After the above step , the reader fetches the next 10 blocks from NN and continue reading Then the reader fetches the remaining 2 blocks from NN and complete the write Cause: === Lets see the cause for this issue now... Scenario that will fail is Writer wrote 10+ blocks and a partial block and called sync. Reader trying to read the file will not get the last partial block . Client first gets the 10 block locations from the NN. Now it checks whether the file is under construction and if so it gets the size of the last partial block from datanode and reads the full file However when the number of blocks is more than 10, the last block will not be in the first fetch. It will be in the second or other blocks(last block will be in (num of blocks / 10)th fetch) The problem now is, in DFSClient there is no logic to get the size of the last partial block(as in case of point 1), for the rest of the fetches other than first fetch, the reader will not be able to read the complete data synced...!! also the InputStream.available api uses the first fetched block size to iterate. Ideally this size has to be increased -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7995) Implement chmod in the HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7995: --- Labels: BB2015-05-TBR (was: ) Implement chmod in the HDFS Web UI -- Key: HDFS-7995 URL: https://issues.apache.org/jira/browse/HDFS-7995 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: BB2015-05-TBR Attachments: HDFS-7995.01.patch, HDFS-7995.02.patch We should let users change the permissions of files and directories using the HDFS Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8037: --- Labels: BB2015-05-TBR easyfix newbie (was: easyfix newbie) WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: BB2015-05-TBR, easyfix, newbie Attachments: HDFS-8037.001.patch, HDFS-8037.002.patch WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8303) QJM should purge old logs in the current directory through FJM
[ https://issues.apache.org/jira/browse/HDFS-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8303: --- Labels: BB2015-05-TBR (was: ) QJM should purge old logs in the current directory through FJM -- Key: HDFS-8303 URL: https://issues.apache.org/jira/browse/HDFS-8303 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Labels: BB2015-05-TBR Attachments: HDFS-8303.0.patch, HDFS-8303.1.patch As the first step of the consolidation effort, QJM should call its FJM to purge the current directory. The current QJM logic of purging current dir is very similar to FJM purging logic. QJM: {code} private static final ListPattern CURRENT_DIR_PURGE_REGEXES = ImmutableList.of( Pattern.compile(edits_\\d+-(\\d+)), Pattern.compile(edits_inprogress_(\\d+)(?:\\..*)?)); ... long txid = Long.parseLong(matcher.group(1)); if (txid minTxIdToKeep) { LOG.info(Purging no-longer needed file + txid); if (!f.delete()) { ... {code} FJM: {code} private static final Pattern EDITS_REGEX = Pattern.compile( NameNodeFile.EDITS.getName() + _(\\d+)-(\\d+)); private static final Pattern EDITS_INPROGRESS_REGEX = Pattern.compile( NameNodeFile.EDITS_INPROGRESS.getName() + _(\\d+)); private static final Pattern EDITS_INPROGRESS_STALE_REGEX = Pattern.compile( NameNodeFile.EDITS_INPROGRESS.getName() + _(\\d+).*(\\S+)); ... ListEditLogFile editLogs = matchEditLogs(files, true); for (EditLogFile log : editLogs) { if (log.getFirstTxId() minTxIdToKeep log.getLastTxId() minTxIdToKeep) { purger.purgeLog(log); } } {code} I can see 2 differences: # Different regex in matching for empty/corrupt in-progress files. The FJM pattern makes more sense to me. # FJM verifies that both start and end txID of a finalized edit file to be old enough. This doesn't make sense because end txID is always larger than start txID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8150: --- Labels: BB2015-05-TBR (was: ) Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Labels: BB2015-05-TBR Attachments: HDFS-8150.1.patch, HDFS-8150.2.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7471: --- Labels: BB2015-05-TBR (was: ) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang Labels: BB2015-05-TBR Attachments: HDFS-7471.001.patch, PreCommit-HDFS-Build #9898 test - testNumVersionsReportedCorrect [Jenkins].html From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8245) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log.
[ https://issues.apache.org/jira/browse/HDFS-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8245: --- Labels: BB2015-05-TBR (was: ) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log. --- Key: HDFS-8245 URL: https://issues.apache.org/jira/browse/HDFS-8245 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Labels: BB2015-05-TBR Attachments: HDFS-8245.patch The following series of events happened on Standby namenode : 2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode Active Namenode (ANN) 2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode ANN The following series of events happened on Active Namenode:, 2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN) 2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from SNN The following series of events happened on datanode ( {color:red} datanodeA {color}): 2015-04-09 07:52:15,817 \[DataXceiver for client DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /:51078 \[Receiving block BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Receiving BP-595383232--1360869396230:blk_1570321882_1102029183867 src: /client:51078 dest: /{color:red}datanodeA:1004{color} 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: /client:51078, dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: HDFS_WRITE, cliID: DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: BP-595383232--1360869396230:blk_1570321882_1102029183867, duration: 148948385 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red}datanodeB {color}:52827 \[Copying block BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Copied BP-595383232--1360869396230:blk_1570321882_1102029183867 to {color:red}datanodeB{color}:52827 2015-04-09 07:52:28,187 \[DataNode: heartbeating to ANN:8020] INFO impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file path/blk_1570321882 for deletion 2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO impl.FsDatasetAsyncDiskService: Deleted BP-595383232--1360869396230 blk_1570321882_1102029183867 file path/blk_1570321882 Then we failover for upgrade and then the standby became active. When we did ls command on this file, we got the following exception: 15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, remote={color:red}datanodeA:1004{color}, for file filename, for pool BP-595383232--1360869396230 block 1570321882_1102029183867 at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at
[jira] [Updated] (HDFS-7060) Avoid taking locks when sending heartbeats from the DataNode
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7060: --- Labels: BB2015-05-TBR (was: ) Avoid taking locks when sending heartbeats from the DataNode Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Xinwei Qin Labels: BB2015-05-TBR Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8056) Decommissioned dead nodes should continue to be counted as dead after NN restart
[ https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8056: --- Labels: BB2015-05-TBR (was: ) Decommissioned dead nodes should continue to be counted as dead after NN restart Key: HDFS-8056 URL: https://issues.apache.org/jira/browse/HDFS-8056 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Labels: BB2015-05-TBR Attachments: HDFS-8056-2.patch, HDFS-8056.patch We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. Bring this up for more input and get the patch in place. Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after NN restarts, those nodes that were dead before NN restart won't be in {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add those dead nodes, but not if they are in the exclude file. {noformat} if (listDeadNodes) { for (InetSocketAddress addr : includedNodes) { if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) { continue; } // The remaining nodes are ones that are referenced by the hosts // files but that we do not know about, ie that we have never // head from. Eg. an entry that is no longer part of the cluster // or a bogus entry was given in the hosts files // // If the host file entry specified the xferPort, we use that. // Otherwise, we guess that it is the default xfer port. // We can't ask the DataNode what it had configured, because it's // dead. DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr .getAddress().getHostAddress(), addr.getHostName(), , addr.getPort() == 0 ? defaultXferPort : addr.getPort(), defaultInfoPort, defaultInfoSecurePort, defaultIpcPort)); setDatanodeDead(dn); nodes.add(dn); } } {noformat} The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8248) Store INodeId instead of the INodeFile object in BlockInfoContiguous
[ https://issues.apache.org/jira/browse/HDFS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8248: --- Labels: BB2015-05-TBR (was: ) Store INodeId instead of the INodeFile object in BlockInfoContiguous Key: HDFS-8248 URL: https://issues.apache.org/jira/browse/HDFS-8248 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Labels: BB2015-05-TBR Attachments: HDFS-8248.000.patch, HDFS-8248.001.patch, HDFS-8248.002.patch, HDFS-8248.003.patch Currently the namespace and the block manager are tightly coupled together. There are two couplings in terms of implementation: 1. The {{BlockInfoContiguous}} stores a reference of the {{INodeFile}} that owns the block, so that the block manager can look up the corresponding file when replicating blocks, recovering from pipeline failures, etc. 1. The {{INodeFile}} stores {{BlockInfoContiguous}} objects that the file owns. Decoupling the namespace and the block manager allows the BM to be separated out from the Java heap or even as a standalone process. This jira proposes to remove the first coupling by storing the id of the inode instead of the object reference of {{INodeFile}} in the {{BlockInfoContiguous}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7897: --- Labels: BB2015-05-TBR (was: ) Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Labels: BB2015-05-TBR Attachments: HDFS-7897-001.patch In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4273) Fix some issue in DFSInputstream
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4273: --- Labels: BB2015-05-TBR (was: ) Fix some issue in DFSInputstream Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java Following issues in DFSInputStream are addressed in this jira: 1. read may not retry enough in some cases cause early failure Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue. 3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4861) BlockPlacementPolicyDefault does not consider decommissioning racks
[ https://issues.apache.org/jira/browse/HDFS-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4861: --- Labels: BB2015-05-TBR (was: ) BlockPlacementPolicyDefault does not consider decommissioning racks --- Key: HDFS-4861 URL: https://issues.apache.org/jira/browse/HDFS-4861 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.1.0-beta Reporter: Kihwal Lee Assignee: Rushabh S Shah Labels: BB2015-05-TBR Attachments: HDFS-4861-v2.patch, HDFS-4861.patch getMaxNodesPerRack() calculates the max replicas/rack like this: {code} int maxNodesPerRack = (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2; {code} Since this does not consider the racks that are being decommissioned and the decommissioning state is only checked later in isGoodTarget(), certain blocks are not replicated even when there are many racks and nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL
[ https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6525: --- Labels: BB2015-05-TBR (was: ) FsShell supports HDFS TTL - Key: HDFS-6525 URL: https://issues.apache.org/jira/browse/HDFS-6525 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, tools Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Labels: BB2015-05-TBR Attachments: HDFS-6525.1.patch, HDFS-6525.2.patch This issue is used to track development of supporting HDFS TTL for FsShell, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager
[ https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6526: --- Labels: BB2015-05-TBR (was: ) Implement HDFS TtlManager - Key: HDFS-6526 URL: https://issues.apache.org/jira/browse/HDFS-6526 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Labels: BB2015-05-TBR Attachments: HDFS-6526.1.patch This issue is used to track development of HDFS TtlManager, for details see HDFS-6382. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6588: --- Labels: BB2015-05-TBR (was: ) Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: BB2015-05-TBR Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch, HDFS-6588.001.patch, HDFS-6588.001.patch When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4631) Support customized call back method during failover automatically.
[ https://issues.apache.org/jira/browse/HDFS-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4631: --- Labels: BB2015-05-TBR features ha hadoop (was: features ha hadoop) Support customized call back method during failover automatically. -- Key: HDFS-4631 URL: https://issues.apache.org/jira/browse/HDFS-4631 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 3.0.0 Reporter: Fengdong Yu Labels: BB2015-05-TBR, features, ha, hadoop Attachments: HDFS-4631.patch Original Estimate: 0.5m Remaining Estimate: 0.5m ZKFC add HealthCallbacks by default, this can do quiteElection at least. but we often want to be alerted if there is fail over occurring(such as send email, short messages), especially for prod cluster. There is a configured fence script. maybe we can put all these logics in the script. but in reasonable, fence script does only one thing: fence :) So I added this patch, we can configure customized HM callback method, if there is no configuration, then only HealthCallbacks is added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6436) WebHdfsFileSystem execute get, renew and cancel delegationtoken operation should use spnego to authenticate
[ https://issues.apache.org/jira/browse/HDFS-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6436: --- Labels: BB2015-05-TBR (was: ) WebHdfsFileSystem execute get, renew and cancel delegationtoken operation should use spnego to authenticate --- Key: HDFS-6436 URL: https://issues.apache.org/jira/browse/HDFS-6436 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Environment: Kerberos Reporter: Bangtao Zhou Labels: BB2015-05-TBR Attachments: HDFS-6436.patch while in kerberos secure mode, when using WebHdfsFileSystem to access HDFS, it allways get an *org.apache.hadoop.security.authentication.client.AuthenticationException: Unauthorized*, for example, when call WebHdfsFileSystem.listStatus it will execute a LISTSTATUS Op, and this Op should authenticate via *delegation token*, so it will execute a GETDELEGATIONTOKEN Op to get a delegation token(actually GETDELEGATIONTOKEN authenticates via *SPNEGO*), but it still use delegation token to authenticate, so it allways get an Unauthorized Exception. Exception is like this: {code:java} 19:05:11.758 [main] DEBUG o.a.h.hdfs.web.URLConnectionFactory - open URL connection java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Unauthorized at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:287) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:82) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:538) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:406) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:434) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:430) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1058) 19:05:11.766 [main] DEBUG o.a.h.security.UserGroupInformation - PrivilegedActionException as:bang...@cyhadoop.com (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Unauthorized at org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:134) 19:05:11.767 [main] DEBUG o.a.h.security.UserGroupInformation - PrivilegedActionException as:bang...@cyhadoop.com (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Unauthorized at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:213) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:371) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:392) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:602) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:533) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:406) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:434) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:430) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1037) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1483) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523) at org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1679) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1678) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1661) at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1723) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1720) at com.cyou.marketing.hop.filesystem.App$1.run(App.java:34) at
[jira] [Updated] (HDFS-5887) Add suffix to generated protobuf class
[ https://issues.apache.org/jira/browse/HDFS-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5887: --- Labels: BB2015-05-TBR (was: ) Add suffix to generated protobuf class -- Key: HDFS-5887 URL: https://issues.apache.org/jira/browse/HDFS-5887 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Tassapol Athiapinya Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-5887.000.patch, HDFS-5887.000.proto_files-only.patch, HDFS-5887.001.patch As suggested by [~tlipcon], the code is more readable if we give each class generated by the protobuf the suffix Proto. This jira proposes to rename the classes and to introduce no functionality changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6066) logGenerationStamp is not needed to reduce editlog size
[ https://issues.apache.org/jira/browse/HDFS-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6066: --- Labels: BB2015-05-TBR (was: ) logGenerationStamp is not needed to reduce editlog size --- Key: HDFS-6066 URL: https://issues.apache.org/jira/browse/HDFS-6066 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: chenping Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-6066-trunk.1.patch, HDFS-6066-trunk.2.patch, HDFS-6066-trunk.3.patch almost every logGenerationStamp has a logAddBlock followed, so we can get the newest gs from the logAddBlock operation indirectly.this will reduce the edit log size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6813) WebHdfsFileSystem#OffsetUrlInputStream should implement PositionedReadable with thead-safe.
[ https://issues.apache.org/jira/browse/HDFS-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6813: --- Labels: BB2015-05-TBR (was: ) WebHdfsFileSystem#OffsetUrlInputStream should implement PositionedReadable with thead-safe. --- Key: HDFS-6813 URL: https://issues.apache.org/jira/browse/HDFS-6813 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Labels: BB2015-05-TBR Attachments: HDFS-6813.001.patch {{PositionedReadable}} definition requires the implementations for its interfaces should be thread-safe. OffsetUrlInputStream(WebHdfsFileSystem inputstream) doesn't implement these interfaces with tread-safe, this JIRA is to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6658: --- Labels: BB2015-05-TBR (was: ) Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Daryn Sharp Labels: BB2015-05-TBR Attachments: BlockListOptimizationComparison.xlsx, BlocksMap redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas list.docx, New primative indexes.jpg, Old triplets.jpg Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4916) DataTransfer may mask the IOException during block transfering
[ https://issues.apache.org/jira/browse/HDFS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4916: --- Labels: BB2015-05-TBR (was: ) DataTransfer may mask the IOException during block transfering -- Key: HDFS-4916 URL: https://issues.apache.org/jira/browse/HDFS-4916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Zesheng Wu Priority: Critical Labels: BB2015-05-TBR Attachments: 4916.v0.patch When a new datanode is added to the pipeline, the client will trigger the block transfer process. In the current implementation, the src datanode calls the run() method of the DataTransfer to transfer the block, this method will mask the IOExceptions during the transfering, and will make the client not realize the failure during the transferring, as a result the client will mistake the failing transferring as successful one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5745) Unnecessary disk check triggered when socket operation has problem.
[ https://issues.apache.org/jira/browse/HDFS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5745: --- Labels: BB2015-05-TBR (was: ) Unnecessary disk check triggered when socket operation has problem. --- Key: HDFS-5745 URL: https://issues.apache.org/jira/browse/HDFS-5745 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.2.1 Reporter: MaoYuan Xian Labels: BB2015-05-TBR Attachments: HDFS-5745.patch When BlockReceiver transfer data fails, it can be found SocketOutputStream translates the exception as IOException with the message The stream is closed: 2014-01-06 11:48:04,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): java.io.IOException: The stream is closed at org.apache.hadoop.net.SocketOutputStream.write at java.io.BufferedOutputStream.flushBuffer at java.io.BufferedOutputStream.flush at java.io.DataOutputStream.flush at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run at java.lang.Thread.run Which makes the checkDiskError method of DataNode called and triggers the disk scan. Can we make the modifications like below in checkDiskError to avoiding this unneccessary disk scan operations?: {code} --- a/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -938,7 +938,8 @@ public class DataNode extends Configured || e.getMessage().startsWith(An established connection was aborted) || e.getMessage().startsWith(Broken pipe) || e.getMessage().startsWith(Connection reset) - || e.getMessage().contains(java.nio.channels.SocketChannel)) { + || e.getMessage().contains(java.nio.channels.SocketChannel) + || e.getMessage().startsWith(The stream is closed)) { LOG.info(Not checking disk as checkDiskError was called on a network + related exception); return; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7485) Avoid string operations in FSPermissionChecker
[ https://issues.apache.org/jira/browse/HDFS-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7485: --- Labels: BB2015-05-TBR (was: ) Avoid string operations in FSPermissionChecker -- Key: HDFS-7485 URL: https://issues.apache.org/jira/browse/HDFS-7485 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Labels: BB2015-05-TBR Attachments: HDFS-7485.000.patch Currently {{FSPermissionChecker}} compares strings when testing users and groups. It should compare the id assigned by SerialNumberManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4309: --- Labels: BB2015-05-TBR patch (was: patch) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha Reporter: WenJin Ma Labels: BB2015-05-TBR, patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. {code} private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6262) HDFS doesn't raise FileNotFoundException if the source of a rename() is missing
[ https://issues.apache.org/jira/browse/HDFS-6262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6262: --- Labels: BB2015-05-TBR (was: ) HDFS doesn't raise FileNotFoundException if the source of a rename() is missing --- Key: HDFS-6262 URL: https://issues.apache.org/jira/browse/HDFS-6262 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Akira AJISAKA Labels: BB2015-05-TBR Attachments: HDFS-6262.2.patch, HDFS-6262.patch HDFS's {{rename(src, dest)}} returns false if src does not exist -all the other filesystems raise {{FileNotFoundException}} This behaviour is defined in {{FSDirectory.unprotectedRenameTo()}} -the attempt is logged, but the operation then just returns false. I propose changing the behaviour of {{DistributedFileSystem}} to be the same as that of the others -and of {{FileContext}}, which does reject renames with nonexistent sources -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6861) Separate Balancer specific logic form Dispatcher
[ https://issues.apache.org/jira/browse/HDFS-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6861: --- Labels: BB2015-05-TBR (was: ) Separate Balancer specific logic form Dispatcher Key: HDFS-6861 URL: https://issues.apache.org/jira/browse/HDFS-6861 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Labels: BB2015-05-TBR Attachments: h6861_20140818.patch, h6861_20140819.patch In order to balance datanode storage utilization of a cluster, Balancer (1) classifies datanodes into different groups (overUtilized, aboveAvgUtilized, belowAvgUtilized and underUtilized), (2) chooses source and target datanode pairs and (3) chooses blocks to move. Some of these logic are in Dispatcher. It is better to separate them out. This JIRA is a further work of HDFS-6828. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5274: --- Labels: BB2015-05-TBR (was: ) Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Labels: BB2015-05-TBR Attachments: 3node_get_200mb.png, 3node_put_200mb.png, 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, HDFS-5274-14.patch, HDFS-5274-15.patch, HDFS-5274-16.patch, HDFS-5274-17.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, HDFS-5274-8.patch, HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6980) TestWebHdfsFileSystemContract fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6980: --- Labels: BB2015-05-TBR (was: ) TestWebHdfsFileSystemContract fails in trunk Key: HDFS-6980 URL: https://issues.apache.org/jira/browse/HDFS-6980 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Akira AJISAKA Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: HDFS-6980.1-2.patch, HDFS-6980.1.patch Many tests in TestWebHdfsFileSystemContract fail by too many open files error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3704) In the DFSClient, Add the node to the dead list when the ipc.Client calls fails
[ https://issues.apache.org/jira/browse/HDFS-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3704: --- Labels: BB2015-05-TBR (was: ) In the DFSClient, Add the node to the dead list when the ipc.Client calls fails --- Key: HDFS-3704 URL: https://issues.apache.org/jira/browse/HDFS-3704 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Nicolas Liochon Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-3704.patch The DFSCLient maintains a list of dead node per input steam. When creating this DFSInputStream, it may connect to one of the nodes to check final block size. If this call fail, this datanode should be put in the dead nodes list to save time. If not it will be retried for the block transfer during the read, and we're likely to get a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7123) Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint
[ https://issues.apache.org/jira/browse/HDFS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7123: --- Labels: BB2015-05-TBR (was: ) Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint Key: HDFS-7123 URL: https://issues.apache.org/jira/browse/HDFS-7123 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Labels: BB2015-05-TBR Attachments: HDFS-7123.patch HDFS-7097 will address the checkpoint and BR issue. In addition, it might still be useful to reduce the overall checkpoint duration, given it blocks edit log replay. If there is large volume of edit log to catch up and NN fail overs, it will impact the availability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7401: --- Labels: BB2015-05-TBR (was: ) Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Keith Pak Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)