[jira] [Commented] (HDFS-8625) count with -h option displays namespace quota in human readable format
[ https://issues.apache.org/jira/browse/HDFS-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596074#comment-14596074 ] Allen Wittenauer commented on HDFS-8625: Right, counts should be in billions, byte sizes should be in gigabytes. So that's pretty much the only bug here. But it's: a) relatively minor b) causes a cascade of other, incompatible changes (e.g., need to honor both b and not g for setting quotas) count with -h option displays namespace quota in human readable format -- Key: HDFS-8625 URL: https://issues.apache.org/jira/browse/HDFS-8625 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Archana T Assignee: Surendra Singh Lilhore Attachments: HDFS-8625.patch When 'count' command is executed with '-h' option , namespace quota is displayed in human readable format -- Example : hdfs dfsadmin -setQuota {color:red}1048576{color} /test hdfs dfs -count -q -h -v /test {color:red}QUOTA REM_QUOTA{color} SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME {color:red}1 M 1.0 M{color}none inf10 0 /test QUOTA and REM_QUOTA shows 1 M (human readable format) which actually should give count value 1048576 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8637) OzoneHandler : Add Error Table
[ https://issues.apache.org/jira/browse/HDFS-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596261#comment-14596261 ] Arpit Agarwal commented on HDFS-8637: - +1 for the patch. Jenkins failures are unrelated to the patch. I will commit it shortly. OzoneHandler : Add Error Table -- Key: HDFS-8637 URL: https://issues.apache.org/jira/browse/HDFS-8637 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8637-HDFS-7240.001.patch Define all errors coming out of REST protocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8625) count with -h option displays namespace quota in human readable format
[ https://issues.apache.org/jira/browse/HDFS-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596074#comment-14596074 ] Allen Wittenauer edited comment on HDFS-8625 at 6/22/15 3:23 PM: - Right, counts should be in billions, byte sizes should be in gigabytes. So that's pretty much the only bug here. But it's: a) relatively minor b) causes a cascade of other, incompatible changes (e.g., need to honor b and not g for setting size-based quotas) was (Author: aw): Right, counts should be in billions, byte sizes should be in gigabytes. So that's pretty much the only bug here. But it's: a) relatively minor b) causes a cascade of other, incompatible changes (e.g., need to honor both b and not g for setting quotas) count with -h option displays namespace quota in human readable format -- Key: HDFS-8625 URL: https://issues.apache.org/jira/browse/HDFS-8625 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Archana T Assignee: Surendra Singh Lilhore Attachments: HDFS-8625.patch When 'count' command is executed with '-h' option , namespace quota is displayed in human readable format -- Example : hdfs dfsadmin -setQuota {color:red}1048576{color} /test hdfs dfs -count -q -h -v /test {color:red}QUOTA REM_QUOTA{color} SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME {color:red}1 M 1.0 M{color}none inf10 0 /test QUOTA and REM_QUOTA shows 1 M (human readable format) which actually should give count value 1048576 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server
[ https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596293#comment-14596293 ] Haohui Mai commented on HDFS-8515: -- The patch looks good. I'm wondering whether it is possible to inherit the {{AbstractChannel}} for the stream class, which is similar to what the {{ChildChannel}} patch has done in https://github.com/netty/netty/issues/3667. This will make the abstraction closer to the ones that netty provides, simplifying the effort of building the applications at the upper layer. Abstract a DTP/2 HTTP/2 server -- Key: HDFS-8515 URL: https://issues.apache.org/jira/browse/HDFS-8515 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Duo Zhang Assignee: Duo Zhang Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, HDFS-8515-v3.patch, HDFS-8515.patch Discussed in HDFS-8471. https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8462) Implement GETXATTRS and LISTXATTRS operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596246#comment-14596246 ] Hadoop QA commented on HDFS-8462: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 43s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 1s | Site still builds. | | {color:green}+1{color} | checkstyle | 0m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 19s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 159m 29s | Tests passed in hadoop-hdfs. | | | | 208m 56s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741031/HDFS-8462-04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 445b132 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11434/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11434/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11434/console | This message was automatically generated. Implement GETXATTRS and LISTXATTRS operation for WebImageViewer --- Key: HDFS-8462 URL: https://issues.apache.org/jira/browse/HDFS-8462 Project: Hadoop HDFS Issue Type: New Feature Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Attachments: HDFS-8462-00.patch, HDFS-8462-01.patch, HDFS-8462-02.patch, HDFS-8462-03.patch, HDFS-8462-04.patch In Hadoop 2.7.0, WebImageViewer supports the following operations: * {{GETFILESTATUS}} * {{LISTSTATUS}} * {{GETACLSTATUS}} I'm thinking it would be better for administrators if {{GETXATTRS}} and {{LISTXATTRS}} are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8637) OzoneHandler : Add Error Table
[ https://issues.apache.org/jira/browse/HDFS-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8637: Resolution: Fixed Fix Version/s: HDFS-7240 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to the feature branch. Thanks [~anu] for the contribution! OzoneHandler : Add Error Table -- Key: HDFS-8637 URL: https://issues.apache.org/jira/browse/HDFS-8637 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Fix For: HDFS-7240 Attachments: hdfs-8637-HDFS-7240.001.patch Define all errors coming out of REST protocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class
[ https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596397#comment-14596397 ] Rakesh R commented on HDFS-8493: Following are the functions where it has done the resolution {{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and not fsd lock. Could you please take a look at it. # FsDirAclOp.java - getAclStatus() - modifyAclEntries() - removeAcl() - removeDefaultAcl() - setAcl() - getAclStatus() # FsDirDeleteOp.java - delete(fsn, src, recursive, logRetryCache) # FsDirRenameOp.java - renameToInt(fsd, srcArg, dstArg, logRetryCache) - renameToInt(fsd, srcArg, dstArg, logRetryCache, options) # FsDirStatAndListingOp.java - getContentSummary(fsd, src) - getFileInfo(fsd, srcArg, resolveLink) - isFileClosed(fsd, src) - getListingInt(fsd, srcArg, startAfter, needLocation) # FsDirWriteFileOp.java - abandonBlock() - completeFile(fsn, pc, srcArg, holder, last, fileId) - getEncryptionKeyInfo(fsn, pc, src, supportedVersions) - startFile() - validateAddBlock() # FsDirXAttrOp.java - getXAttrs(fsd, srcArg, xAttrs) - listXAttrs(fsd, src) - setXAttr(fsd, src, xAttr, flag, logRetryCache) # FSNamesystem.java - createEncryptionZoneInt() - getEZForPath() Consolidate truncate() related implementation in a single class --- Key: HDFS-8493 URL: https://issues.apache.org/jira/browse/HDFS-8493 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch This jira proposes to consolidate truncate() related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8644) OzoneHandler : Add volume handler
Anu Engineer created HDFS-8644: -- Summary: OzoneHandler : Add volume handler Key: HDFS-8644 URL: https://issues.apache.org/jira/browse/HDFS-8644 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Add volume handler logic that dispatches volume related calls to the right interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-7214: -- Attachment: HDFS-7214.v3.patch Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596712#comment-14596712 ] Haohui Mai commented on HDFS-8617: -- bq. You can read my related SoCC paper here: http://umbrant.com/papers/socc12-cake.pdf . I experimented with ioprio about 3 years ago as part of this work, and didn't get positive results. We needed application-level throttling. As you mentioned in the evaluation, there are adverse effects on throughputs. I agree that application-level throttling can be useful. The proposed solution, however, relies on magic numbers to work. My concern is that how to choose the magic numbers? Is it repeatable to achieve good performance? Is it generalizable to other configuration? It looks to me that currently the answers of both questions are no. The proposed solution looks like lowering the utilization of the cluster (at the cost of making {{checkDir()}} really slow) to meet the SLOs. bq. The key issue though, as both Colin and I have mentioned, is that there is queuing both in the OS and on disk. ioprio only affects OS-level queuing, and disk-level queuing can be quite substantial. Not sure how much more needs to be said. Point taken. Unfortunately without performance benchmarks and numbers the statements are purely speculative. For example, what do you mean by substantial? The size of the NCQ is 32 compared the size of OS level I/O queue can be hundreds or thousands. I really appreciate doing some performance benchmarks and sharing the numbers. My concern of the proposal is that the parameter cannot be automatically tunable w.r.t. cluster configurations and loads. It has to be dynamic. In the longer term it makes a lot sense to tune these parameters based on the length of the I/O queue, avg. processing time, etc. At the first step I think it can be very helpful to simply correlate these parameters with simple metrics like the number of tranceiver threads. Throttle DiskChecker#checkDirs() speed. --- Key: HDFS-8617 URL: https://issues.apache.org/jira/browse/HDFS-8617 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8617.000.patch As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is causing excessive I/Os because {{finalizedDirs}} might have up to 64K sub-directories (HDFS-6482). This patch proposes to limit the rate of IO operations in {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8480: --- Resolution: Fixed Fix Version/s: 2.7.1 Status: Resolved (was: Patch Available) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596652#comment-14596652 ] Hadoop QA commented on HDFS-7214: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 10s | The applied patch generated 2 new checkstyle issues (total was 183, now 185). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 160m 3s | Tests passed in hadoop-hdfs. | | | | 206m 29s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741081/HDFS-7214.v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11436/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11436/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11436/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11436/console | This message was automatically generated. Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch, HDFS-7214.v4.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)
[ https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596649#comment-14596649 ] Andrew Wang commented on HDFS-8608: --- +1 LGTM. I'm going to commit this to branch-2. Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks) -- Key: HDFS-8608 URL: https://issues.apache.org/jira/browse/HDFS-8608 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 3.0.0 Attachments: HDFS-4366-branch-2.00.patch, HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, HDFS-8608.02.patch This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when merging the HDFS-7285 (erasure coding) branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks
[ https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-4366: -- Fix Version/s: (was: 3.0.0) 2.8.0 [~zhz] did up a branch-2 patch which I backported, changing the fix version to reflect this. Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks - Key: HDFS-4366 URL: https://issues.apache.org/jira/browse/HDFS-4366 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Derek Dagit Assignee: Derek Dagit Fix For: 2.8.0 Attachments: HDFS-4366-branch-2.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, hdfs-4366-unittest.patch In certain cases, higher-priority under-replicated blocks can be skipped by the replication policy implementation. The current implementation maintains, for each priority level, an index into a list of blocks that are under-replicated. Together, the lists compose a priority queue (see note later about branch-0.23). In some cases when blocks are removed from a list, the caller (BlockManager) properly handles the index into the list from which it removed a block. In some other cases, the index remains stationary while the list changes. Whenever this happens, and the removed block happened to be at or before the index, the implementation will skip over a block when selecting blocks for replication work. In situations when entire racks are decommissioned, leading to many under-replicated blocks, loss of blocks can occur. Background: HDFS-1765 This patch to trunk greatly improved the state of the replication policy implementation. Prior to the patch, the following details were true: * The block priority queue was no such thing: It was really set of trees that held blocks in natural ordering, that being by the blocks ID, which resulted in iterator walks over the blocks in pseudo-random order. * There was only a single index into an iteration over all of the blocks... * ... meaning the implementation was only successful in respecting priority levels on the first pass. Overall, the behavior was a round-robin-type scheduling of blocks. After the patch * A proper priority queue is implemented, preserving log n operations while iterating over blocks in the order added. * A separate index for each priority is key is kept... * ... allowing for processing of the highest priority blocks first regardless of which priority had last been processed. The change was suggested for branch-0.23 as well as trunk, but it does not appear to have been pulled in. The problem: Although the indices are now tracked in a better way, there is a synchronization issue since the indices are managed outside of methods to modify the contents of the queue. Removal of a block from a priority level without adjusting the index can mean that the index then points to the block after the block it originally pointed to. In the next round of scheduling for that priority level, the block originally pointed to by the index is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596698#comment-14596698 ] Hadoop QA commented on HDFS-8542: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741136/HDFS-8542-branch-2.7.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7b424f9 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11439/console | This message was automatically generated. WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8542: -- Status: Patch Available (was: Open) WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596608#comment-14596608 ] Siqi Li commented on HDFS-7214: --- It looks like HDFS-7257 has already checked in a similar feature as of this jira. I feel like we could resolve this one by marking it as a duplicate Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch, HDFS-7214.v4.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8480: --- Summary: Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them (was: Fix performance and timeout issues in HDFS-7929: use hard-links instead of copying edit logs) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7390) Provide JMX metrics per storage type
[ https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-7390: --- Attachment: HDFS-7390-005.patch Provide JMX metrics per storage type Key: HDFS-7390 URL: https://issues.apache.org/jira/browse/HDFS-7390 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.5.2 Reporter: Benoy Antony Assignee: Benoy Antony Labels: BB2015-05-TBR Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, HDFS-7390-005.patch, HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch HDFS-2832 added heterogeneous support. In a cluster with different storage types, it is useful to have metrics per storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8542: -- Status: Open (was: Patch Available) WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596627#comment-14596627 ] Andrew Wang commented on HDFS-8617: --- You can read my related SoCC paper here: http://umbrant.com/papers/socc12-cake.pdf . I experimented with ioprio about 3 years ago as part of this work, and didn't get positive results. We needed application-level throttling. The key issue though, as both Colin and I have mentioned, is that there is queuing both in the OS and on disk. ioprio only affects OS-level queuing, and disk-level queuing can be quite substantial. Not sure how much more needs to be said. Also as Colin (and I) mentioned, deadline and noop IO schedulers are often used for latency sensitive workloads like HBase, and ioprio only works with CFQ. Thus ioprio is not going to work in this situation. Throttle DiskChecker#checkDirs() speed. --- Key: HDFS-8617 URL: https://issues.apache.org/jira/browse/HDFS-8617 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8617.000.patch As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is causing excessive I/Os because {{finalizedDirs}} might have up to 64K sub-directories (HDFS-6482). This patch proposes to limit the rate of IO operations in {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)
[ https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8608: -- Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.8.0 Status: Resolved (was: Patch Available) Thanks again Zhe, I committed both HDFS-4366 and HDFS-8608 to branch-2. The HDFS-8608 backport was a little unclean though, and I noticed we still have some changes between branch-2 and trunk in TestReplicationPolicy we should probably resolve. Mind taking this on too as a follow-on? Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks) -- Key: HDFS-8608 URL: https://issues.apache.org/jira/browse/HDFS-8608 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-4366-branch-2.00.patch, HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, HDFS-8608.02.patch This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when merging the HDFS-7285 (erasure coding) branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929: use hard-links instead of copying edit logs
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596671#comment-14596671 ] Colin Patrick McCabe commented on HDFS-8480: Thanks, [~zhz]. +1. Fix performance and timeout issues in HDFS-7929: use hard-links instead of copying edit logs Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8542: -- Attachment: HDFS-8542-branch-2.7.002.patch I'm still not wild about caching the result since again, (a) the value is never discarded, so it's not a cache and (b) backing systems could choose to change this value on a subsequent call. However, both FileSystem and DistributedFileSystem are doing some questionable things with this API, so I'll worry about those issues later, if we run into them. +1 on current patch. Failed tests are spurious. Attaching a version for 2.7 (same except location of JsonUtils). Will commit both after Jenkins has a pass over backport. WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7390) Provide JMX metrics per storage type
[ https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-7390: --- Attachment: (was: HDFS-7390-005.patch) Provide JMX metrics per storage type Key: HDFS-7390 URL: https://issues.apache.org/jira/browse/HDFS-7390 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.5.2 Reporter: Benoy Antony Assignee: Benoy Antony Labels: BB2015-05-TBR Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch HDFS-2832 added heterogeneous support. In a cluster with different storage types, it is useful to have metrics per storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally
[ https://issues.apache.org/jira/browse/HDFS-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan resolved HDFS-3620. --- Resolution: Duplicate This issue was duplicated and dealt with in HDFS-8542. WebHdfsFileSystem getHomeDirectory() should not resolve locally --- Key: HDFS-3620 URL: https://issues.apache.org/jira/browse/HDFS-3620 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Alejandro Abdelnur Priority: Critical WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return '/user/' + UGI#shortname. Instead, it should make a HTTP REST call with op=GETHOMEDIRECTORY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7929) inotify unable fetch pre-upgrade edit log segments once upgrade starts
[ https://issues.apache.org/jira/browse/HDFS-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596702#comment-14596702 ] Hudson commented on HDFS-7929: -- FAILURE: Integrated in Hadoop-trunk-Commit #8046 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8046/]) HDFS-8480. Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs, instead of copying them. (Zhe Zhang via Colin P. McCabe) (cmccabe: rev 7b424f938c3c306795d574792b086d84e4f06425) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNUpgradeUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java inotify unable fetch pre-upgrade edit log segments once upgrade starts -- Key: HDFS-7929 URL: https://issues.apache.org/jira/browse/HDFS-7929 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0 Attachments: HDFS-7929-000.patch, HDFS-7929-001.patch, HDFS-7929-002.patch, HDFS-7929-003.patch inotify is often used to periodically poll HDFS events. However, once an HDFS upgrade has started, edit logs are moved to /previous on the NN, which is not accessible. Moreover, once the upgrade is finalized /previous is currently lost forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596703#comment-14596703 ] Hudson commented on HDFS-8480: -- FAILURE: Integrated in Hadoop-trunk-Commit #8046 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8046/]) HDFS-8480. Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs, instead of copying them. (Zhe Zhang via Colin P. McCabe) (cmccabe: rev 7b424f938c3c306795d574792b086d84e4f06425) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNUpgradeUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596748#comment-14596748 ] Andrew Wang commented on HDFS-8617: --- bq. As you mentioned in the evaluation, there are adverse effects on throughputs...The proposed solution looks like lowering the utilization of the cluster (at the cost of making checkDir() really slow) to meet the SLOs. I'd like to turn this question around and ask: is there a downside to throttling checkDisk throughput? We might end up taking longer to detect a bad disk, but this is not a performance-critical workload. Here's also another idea for a throttle: spend at most x% of time doing checkDisk work. Maybe we say it can only run for 250ms of every 1000ms interval. Timeslicing like this automatically tunes for faster vs. slower IO rates. Throttle DiskChecker#checkDirs() speed. --- Key: HDFS-8617 URL: https://issues.apache.org/jira/browse/HDFS-8617 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8617.000.patch As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is causing excessive I/Os because {{finalizedDirs}} might have up to 64K sub-directories (HDFS-6482). This patch proposes to limit the rate of IO operations in {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7645: -- Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) This change is incompatible since we expose RollingUpgradeInfo in the NN's JMX (a public API). As discussed above, rather than being null on finalization, it now sets the finalization time. Have we thought about other ways of solving this issue? Else we can change the JMX method to still return null on finalization. Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Fix For: 2.8.0 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, HDFS-7645.06.patch, HDFS-7645.07.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8542: -- Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Jenkins isn't running against minor versions. I've committed this to trunk and branch-2. Thanks, Kanaka. Resolving. WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Fix For: 2.8.0 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596869#comment-14596869 ] Hudson commented on HDFS-8542: -- FAILURE: Integrated in Hadoop-trunk-Commit #8049 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8049/]) HDFS-8542. WebHDFS getHomeDirectory behavior does not match specification. Contributed by Kanaka Kumar Avvaru. (jghoman: rev fac4e04dd359a7ff31f286d664fb06f019ec0b58) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHDFS.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Fix For: 2.8.0 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated HDFS-8578: --- Fix Version/s: 2.7.1 On upgrade, Datanode should process all storage/data dirs in parallel - Key: HDFS-8578 URL: https://issues.apache.org/jira/browse/HDFS-8578 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Raju Bairishetti Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then datanode which has ~10 disks will take around 3hours to come up. *BlockPoolSliceStorage.java* {code} for (int idx = 0; idx getNumStorageDirs(); idx++) { doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); assert getCTime() == nsInfo.getCTime() : Data-node and name-node CTimes must be the same.; } {code} It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly. Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8645) Resolve inconsistent code in TestReplicationPolicy between trunk and branch-2
Zhe Zhang created HDFS-8645: --- Summary: Resolve inconsistent code in TestReplicationPolicy between trunk and branch-2 Key: HDFS-8645 URL: https://issues.apache.org/jira/browse/HDFS-8645 Project: Hadoop HDFS Issue Type: Test Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Per [discussion | https://issues.apache.org/jira/browse/HDFS-8608?focusedCommentId=14596665page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14596665] under HDFS-8608. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596846#comment-14596846 ] Zhe Zhang commented on HDFS-8619: - A quick comment is maybe we should consider targeting this for trunk? I haven't finished reviewing the entire patch, and I see the following changes besides the main change mentioned above: # A new {{hasNoDataNodes}} logic. # A {{Block-BlockInfo}} refactor for {{postponedMisreplicatedBlocks}}. # Refactor of {{invalidateBlock}} to take counted nodes as input instead of counting again. # General code cleanups All changes LGTM overall, and all look applicable against trunk (except for the tests). Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596905#comment-14596905 ] Jing Zhao commented on HDFS-8619: - Thanks for the review, Zhe! Sure, I will separate some refactoring out for trunk. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596967#comment-14596967 ] Jing Zhao commented on HDFS-8619: - Besides, we now have merged quite a few changes to trunk, any plan for merging trunk changes to the HDFS-7285 feature branch? Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596932#comment-14596932 ] Zhe Zhang commented on HDFS-8619: - Thanks Jing! I meant to say all changes, including the main {{CorruptReplicasMap}} change LGTM overall and look applicable for trunk. Should we just retarget the JIRA to trunk? Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596933#comment-14596933 ] Zhe Zhang commented on HDFS-8619: - Thanks Jing! I meant to say all changes, including the main {{CorruptReplicasMap}} change LGTM overall and look applicable for trunk. Should we just retarget the JIRA to trunk? Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8186) Erasure coding: Make block placement policy for EC file configurable
[ https://issues.apache.org/jira/browse/HDFS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596973#comment-14596973 ] Walter Su commented on HDFS-8186: - comparison of HDFS-7068 and HDFS-8186 is [here|https://issues.apache.org/jira/browse/HDFS-7068?focusedCommentId=14596964page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14596964] Erasure coding: Make block placement policy for EC file configurable Key: HDFS-8186 URL: https://issues.apache.org/jira/browse/HDFS-8186 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Fix For: HDFS-7285 Attachments: HDFS-8186-HDFS-7285.002.txt, HDFS-8186-HDFS-7285.003.patch, HDFS-8186.001.txt This includes: 1. User can config block placement policy for EC file in xml configuration file. 2. EC policy works for EC file, replication policy works for non-EC file. They are coexistent. Not includes: 1. Details of block placement policy for EC. Discussion and implementation goes to HDFS-7613. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596822#comment-14596822 ] Zhe Zhang commented on HDFS-7068: - [~walter.k.su] Since we are preparing to merge the HDFS-7285 branch to trunk, we should probably revisit this JIRA. I suggest we split the HDFS-8186 patch and separate the multi-policy part out for this JIRA ({{BlockPlacementPolicies}} etc.). That part needs to be reviewed against trunk anyway as part of the merge. And logically it is orthogonal to EC logic. Separating it out will reduce the consolidated EC patch and make merge-review easier. Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596853#comment-14596853 ] Arpit Agarwal commented on HDFS-8277: - Hi [~surendrasingh], while putting the safe mode status in the NN persistent state is the right solution I agree with [~vinayrpet] that it would be an incompatible change for 2.x. If we cannot make the change for 2.x I prefer not changing the current behavior of failing 'safemode enter' when SBN is down. Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: Surendra Singh Lilhore Priority: Minor Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler
[ https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596856#comment-14596856 ] Hadoop QA commented on HDFS-8634: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 15s | Pre-patch HDFS-7240 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 20s | The applied patch generated 2 new checkstyle issues (total was 1, now 3). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 19s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 161m 42s | Tests failed in hadoop-hdfs. | | | | 209m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestNameEditsConfigs | | | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741113/hdfs-8634-HDFS-7240.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 1e75142 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11438/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11438/console | This message was automatically generated. OzoneHandler: Add userAuth Interface and Simple userAuth handler Key: HDFS-8634 URL: https://issues.apache.org/jira/browse/HDFS-8634 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8634-HDFS-7240.001.patch Add user authentication interface and also the first concrete implementation for that interface called simple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596853#comment-14596853 ] Arpit Agarwal edited comment on HDFS-8277 at 6/22/15 11:39 PM: --- Hi [~surendrasingh], while putting the safe mode status in the NN persistent state is the right solution I agree with [~vinayrpet] that it would be an incompatible change for 2.x. If we cannot make the change for 2.x I prefer not changing the current behavior of failing 'safemode enter' when SBN is down. [~vinayrpet] - what do you think? was (Author: arpitagarwal): Hi [~surendrasingh], while putting the safe mode status in the NN persistent state is the right solution I agree with [~vinayrpet] that it would be an incompatible change for 2.x. If we cannot make the change for 2.x I prefer not changing the current behavior of failing 'safemode enter' when SBN is down. Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: Surendra Singh Lilhore Priority: Minor Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8578: --- Fix Version/s: (was: 2.7.1) On upgrade, Datanode should process all storage/data dirs in parallel - Key: HDFS-8578 URL: https://issues.apache.org/jira/browse/HDFS-8578 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Raju Bairishetti Priority: Critical Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then datanode which has ~10 disks will take around 3hours to come up. *BlockPoolSliceStorage.java* {code} for (int idx = 0; idx getNumStorageDirs(); idx++) { doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); assert getCTime() == nsInfo.getCTime() : Data-node and name-node CTimes must be the same.; } {code} It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly. Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596964#comment-14596964 ] Walter Su commented on HDFS-7068: - comparison of HDFS-7068 and HDFS-8186({{BlockPlacementPolicies}}): *strategy* HDFS-7068: policy given by user. HDFS-8186: policy determined from context(file status) *extensibility* HDFS-7068: better. HDFS-8186: BlockPlacementPolicies accepts a {{boolean}} argument and returns a ec/non-ec policy. In the future, we can extends argument list. *code complexity* HDFS-7068: complicated HDFS-8186: simple *memory usage* HDFS-7068: xattr or inode header HDFS-8186: none bq. I'm wondering if we could do it in lighter way. In my understanding, if the file is in replication mode as by default, then we'll go to the current block placement policy as it goes currently in trunk; otherwise, if stripping and/or ec is involved, then we have a new single customized placement policy to cover all the related cases. Hi, [~drankye]! Thanks for your advice. HDFS-8186 did that. bq. I'm also +1 for #1. Hi, [~jingzhao]! I think we can revisit HDFS-7068 and #3 design? HDFS-8186 works for EC branch. I'm not sure it's acceptable for trunk. I can ask everybody's opinion from mailing list. Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596786#comment-14596786 ] Zhe Zhang commented on HDFS-8480: - Thanks Colin for the review! And helpful comments from Vinod, Andrew, and Arpit. Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8619: Attachment: HDFS-8619.000.patch Initial patch to fix the above issue. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7390) Provide JMX metrics per storage type
[ https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596961#comment-14596961 ] Hadoop QA commented on HDFS-7390: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 45s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 3 new checkstyle issues (total was 228, now 230). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 160m 1s | Tests passed in hadoop-hdfs. | | | | 206m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741135/HDFS-7390-005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 11ac848 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11440/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11440/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11440/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11440/console | This message was automatically generated. Provide JMX metrics per storage type Key: HDFS-7390 URL: https://issues.apache.org/jira/browse/HDFS-7390 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.5.2 Reporter: Benoy Antony Assignee: Benoy Antony Labels: BB2015-05-TBR Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch HDFS-2832 added heterogeneous support. In a cluster with different storage types, it is useful to have metrics per storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596797#comment-14596797 ] Hadoop QA commented on HDFS-7214: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 15s | The applied patch generated 2 new checkstyle issues (total was 183, now 185). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 147m 27s | Tests failed in hadoop-hdfs. | | | | 193m 43s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM | | | hadoop.hdfs.TestDFSInotifyEventInputStream | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.TestDFSClientFailover | | | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits | | | hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir | | | hadoop.hdfs.server.namenode.ha.TestQuotasWithHA | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | | | hadoop.hdfs.server.namenode.ha.TestDNFencing | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.ha.TestHAMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover | | | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA | | | hadoop.tools.TestJMXGet | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints | | | hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement | | | hadoop.hdfs.server.datanode.TestBlockReplacement | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.ha.TestHAFsck | | | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits | | | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages | | | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | | | hadoop.hdfs.tools.TestDFSHAAdminMiniCluster | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.server.namenode.TestEditLogAutoroll | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.hdfs.TestEncryptionZonesWithHA | | | hadoop.hdfs.server.namenode.ha.TestHarFileSystemWithHA | | | hadoop.hdfs.TestRollingUpgradeDowngrade | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741103/HDFS-7214.v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11437/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11437/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11437/testReport/ | | Java | 1.7.0_55 | | uname | Linux
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596799#comment-14596799 ] Zhe Zhang commented on HDFS-8619: - Thanks Jing for the work! I think the analysis makes sense. I will review the patch later today. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)
[ https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596810#comment-14596810 ] Zhe Zhang commented on HDFS-8608: - Thanks Jing and Andrew for reviewing the patch! I filed HDFS-8645 to address the {{TestReplicationPolicy}} issue. Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks) -- Key: HDFS-8608 URL: https://issues.apache.org/jira/browse/HDFS-8608 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-4366-branch-2.00.patch, HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, HDFS-8608.02.patch This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when merging the HDFS-7285 (erasure coding) branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596862#comment-14596862 ] Raju Bairishetti commented on HDFS-8578: [~vinayrpet] Have you done any performance benchmarking with this approach? If yes, Could you please post the results here? On upgrade, Datanode should process all storage/data dirs in parallel - Key: HDFS-8578 URL: https://issues.apache.org/jira/browse/HDFS-8578 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Raju Bairishetti Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then datanode which has ~10 disks will take around 3hours to come up. *BlockPoolSliceStorage.java* {code} for (int idx = 0; idx getNumStorageDirs(); idx++) { doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); assert getCTime() == nsInfo.getCTime() : Data-node and name-node CTimes must be the same.; } {code} It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly. Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596959#comment-14596959 ] Jing Zhao commented on HDFS-8619: - No. I guess we still need this jira for adding the striped block logic and the tests. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots
Rakesh R created HDFS-8642: -- Summary: Improve TestFileTruncate#setup by deleting the snapshots Key: HDFS-8642 URL: https://issues.apache.org/jira/browse/HDFS-8642 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor I've observed {{TestFileTruncate#setup()}} function has to be improved by making it more independent. Presently if any of the snapshots related test failures will affect all the subsequent unit test cases. One such error has been observed in the [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart] {code} https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/ org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted since /test is snapshottable and already has snapshots at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) at org.apache.hadoop.ipc.Client.call(Client.java:1440) at org.apache.hadoop.ipc.Client.call(Client.java:1371) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy22.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy23.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class
[ https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595735#comment-14595735 ] Vinayakumar B commented on HDFS-8493: - bq. The resolution should be in the lock of FSDirectory. IMO, I think this is okay, especially for write ops, provided fsn writelock is held. And I can see many places where this resolution is done under fsn lock held, but not fsd lock. This triggered thoughts, Why two separate locks, fsdir lock and fsnamesystem locks.? Almost all ops are go through fsn with lock (read/write) held, and then go on-to get fsdir locks. Any thoughts? Consolidate truncate() related implementation in a single class --- Key: HDFS-8493 URL: https://issues.apache.org/jira/browse/HDFS-8493 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch This jira proposes to consolidate truncate() related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots
[ https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595795#comment-14595795 ] Hadoop QA commented on HDFS-8642: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 8m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 13s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 19s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 159m 29s | Tests passed in hadoop-hdfs. | | | | 184m 22s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740975/HDFS-8642-00.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11432/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11432/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11432/console | This message was automatically generated. Improve TestFileTruncate#setup by deleting the snapshots Key: HDFS-8642 URL: https://issues.apache.org/jira/browse/HDFS-8642 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Attachments: HDFS-8642-00.patch I've observed {{TestFileTruncate#setup()}} function has to be improved by making it more independent. Presently if any of the snapshots related test failures will affect all the subsequent unit test cases. One such error has been observed in the [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart] {code} https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/ org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted since /test is snapshottable and already has snapshots at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) at
[jira] [Updated] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots
[ https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8642: --- Attachment: HDFS-8642-00.patch Improve TestFileTruncate#setup by deleting the snapshots Key: HDFS-8642 URL: https://issues.apache.org/jira/browse/HDFS-8642 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Attachments: HDFS-8642-00.patch I've observed {{TestFileTruncate#setup()}} function has to be improved by making it more independent. Presently if any of the snapshots related test failures will affect all the subsequent unit test cases. One such error has been observed in the [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart] {code} https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/ org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted since /test is snapshottable and already has snapshots at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) at org.apache.hadoop.ipc.Client.call(Client.java:1440) at org.apache.hadoop.ipc.Client.call(Client.java:1371) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy22.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy23.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus
Rakesh R created HDFS-8643: -- Summary: Add snapshot names list to SnapshottableDirectoryStatus Key: HDFS-8643 URL: https://issues.apache.org/jira/browse/HDFS-8643 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding {{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. IMHO this would help the users to get the list of snapshot names created. Also, the snapshot names can be used while renaming or deleting the snapshots. {code} org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java /** * @return Snapshot names for the directory. */ public List String getSnapshotNames() { return snapshotNames; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots
[ https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8642: --- Target Version/s: 2.8.0 Status: Patch Available (was: Open) Improve TestFileTruncate#setup by deleting the snapshots Key: HDFS-8642 URL: https://issues.apache.org/jira/browse/HDFS-8642 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Attachments: HDFS-8642-00.patch I've observed {{TestFileTruncate#setup()}} function has to be improved by making it more independent. Presently if any of the snapshots related test failures will affect all the subsequent unit test cases. One such error has been observed in the [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart] {code} https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/ org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted since /test is snapshottable and already has snapshots at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) at org.apache.hadoop.ipc.Client.call(Client.java:1440) at org.apache.hadoop.ipc.Client.call(Client.java:1371) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy22.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy23.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8462) Implement GETXATTRS and LISTXATTRS operation for WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595516#comment-14595516 ] Akira AJISAKA commented on HDFS-8462: - Thanks [~jagadesh.kiran] for updating the patch. Minor comment: {code:title=TestOfflineImageViewerForXAttr.java} WebImageViewer viewer = new WebImageViewer( NetUtils.createSocketAddr(localhost:0)); try { viewer.initServer(originalFsimage.getAbsolutePath()); ... } finally { // shutdown the viewer viewer.close(); } {code} Would you use try-with-resources instead of try-finally? I'm +1 if that is addressed. Implement GETXATTRS and LISTXATTRS operation for WebImageViewer --- Key: HDFS-8462 URL: https://issues.apache.org/jira/browse/HDFS-8462 Project: Hadoop HDFS Issue Type: New Feature Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Attachments: HDFS-8462-00.patch, HDFS-8462-01.patch, HDFS-8462-02.patch, HDFS-8462-03.patch In Hadoop 2.7.0, WebImageViewer supports the following operations: * {{GETFILESTATUS}} * {{LISTSTATUS}} * {{GETACLSTATUS}} I'm thinking it would be better for administrators if {{GETXATTRS}} and {{LISTXATTRS}} are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8609) Dead Code in DFS Util for DFSUtil#substituteForWildcardAddress
[ https://issues.apache.org/jira/browse/HDFS-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt resolved HDFS-8609. Resolution: Invalid Sorry my mistake . Closing the issue as invalid Dead Code in DFS Util for DFSUtil#substituteForWildcardAddress -- Key: HDFS-8609 URL: https://issues.apache.org/jira/browse/HDFS-8609 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Bibin A Chundatt Assignee: Surendra Singh Lilhore Priority: Minor Dead code after JDK 1.4 {code} otherHttpAddr = DFSUtil.getInfoServerWithDefaultHost( otherIpcAddr.getHostName(), otherNode, scheme).toURL(); {code} In {{DFSUtil#substituteForWildcardAddress}} {code} if (addr != null addr.isAnyLocalAddress()) { ... } {code} addr.isAnyLocalAddress() will always return false. Always the url will be formed with address which is configured in hdfs-site.xml .Same will affect bootStrap from NN and ssl certificate check -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class
[ https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596306#comment-14596306 ] Haohui Mai commented on HDFS-8493: -- bq. IMO, I think this is okay, especially for write ops, provided fsn writelock is held. And I can see many places where this resolution is done under fsn lock held, but not fsd lock. Can you list these places and file jiras? They are critical bugs and should be fixed. bq. This triggered thoughts, Why two separate locks, fsdir lock and fsnamesystem locks.? Almost all ops are go through fsn with lock (read/write) held, and then go on-to get fsdir locks. Though most of the time the fsd lock is acquired within the fsn lock. BlockManager and LeaseManager only requires the fsn lock but not the fsd lock. We're in the process of cleaning up the locks of both fsn and fsd locks. At the end of the day the NN should be able to process block reports w/o blocking requests to the namespace. Consolidate truncate() related implementation in a single class --- Key: HDFS-8493 URL: https://issues.apache.org/jira/browse/HDFS-8493 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch This jira proposes to consolidate truncate() related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification
[ https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596331#comment-14596331 ] Hadoop QA commented on HDFS-8542: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 22s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 7s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 8s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 21s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 54s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 16s | Tests passed in hadoop-hdfs-client. | | | | 205m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.TestEncryptionZonesWithKMS | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741037/HDFS-8542-02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11435/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11435/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11435/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11435/console | This message was automatically generated. WebHDFS getHomeDirectory behavior does not match specification -- Key: HDFS-8542 URL: https://issues.apache.org/jira/browse/HDFS-8542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jakob Homan Assignee: kanaka kumar avvaru Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, HDFS-8542-02.patch Per the [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory], WebHDFS provides a REST endpoint for getting the user's home directory: {noformat}Submit a HTTP GET request. curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat} However, WebHDFSFileSystem.java does not use this, instead building the home [directory locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]: {code} /** @return the home directory. */ public static String getHomeDirectoryString(final UserGroupInformation ugi) { return /user/ + ugi.getShortUserName(); } @Override public Path getHomeDirectory() { return makeQualified(new Path(getHomeDirectoryString(ugi))); }{code} The WebHDFSFileSystem client should call to the REST service to determine the home directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus
[ https://issues.apache.org/jira/browse/HDFS-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596432#comment-14596432 ] Rakesh R commented on HDFS-8643: Following warnings are not related to this patch: - Whitespace: the reported problem is not in the scope of the patch, line no: 58 is next line to the proposed changes in the patch {code} ./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java:58: {code} - checkstyle: the reported problem is already exists in the present code. {code} ./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java:62:10: More than 7 parameters (found 12). {code} - Test case failure: It looks like not related to the patch. {code} https://builds.apache.org/job/PreCommit-HDFS-Build/11433/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlocksWithNotEnoughRacks/testSufficientlyReplBlocksUsesNewRack/ java.lang.RuntimeException: java.util.zip.ZipException: invalid code lengths set at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2546) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2534) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2605) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2558) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2469) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1205) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1177) at org.apache.hadoop.conf.Configuration.setLong(Configuration.java:1422) at org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.getConf(TestBlocksWithNotEnoughRacks.java:63) at org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testSufficientlyReplBlocksUsesNewRack(TestBlocksWithNotEnoughRacks.java:88) {code} Add snapshot names list to SnapshottableDirectoryStatus --- Key: HDFS-8643 URL: https://issues.apache.org/jira/browse/HDFS-8643 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8643-00.patch The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding {{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. IMHO this would help the users to get the list of snapshot names created. Also, the snapshot names can be used while renaming or deleting the snapshots. {code} org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java /** * @return Snapshot names for the directory. */ public List String getSnapshotNames() { return snapshotNames; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596430#comment-14596430 ] Haohui Mai commented on HDFS-8617: -- bq. Andrew and I actually benchmarked setting ioprio in order to implement quality of service on the DataNode. It didn't have very much effect. Can you please share your code and results? Thanks Throttle DiskChecker#checkDirs() speed. --- Key: HDFS-8617 URL: https://issues.apache.org/jira/browse/HDFS-8617 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8617.000.patch As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is causing excessive I/Os because {{finalizedDirs}} might have up to 64K sub-directories (HDFS-6482). This patch proposes to limit the rate of IO operations in {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-7214: -- Attachment: HDFS-7214.v4.patch Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch, HDFS-7214.v4.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks
[ https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596659#comment-14596659 ] Hudson commented on HDFS-4366: -- FAILURE: Integrated in Hadoop-trunk-Commit #8045 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8045/]) Move HDFS-4366 to 2.8.0 in CHANGES.txt (wang: rev 5590e914f5889413da9eda047f64842c4b67fe85) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks - Key: HDFS-4366 URL: https://issues.apache.org/jira/browse/HDFS-4366 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Derek Dagit Assignee: Derek Dagit Fix For: 3.0.0 Attachments: HDFS-4366-branch-2.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, hdfs-4366-unittest.patch In certain cases, higher-priority under-replicated blocks can be skipped by the replication policy implementation. The current implementation maintains, for each priority level, an index into a list of blocks that are under-replicated. Together, the lists compose a priority queue (see note later about branch-0.23). In some cases when blocks are removed from a list, the caller (BlockManager) properly handles the index into the list from which it removed a block. In some other cases, the index remains stationary while the list changes. Whenever this happens, and the removed block happened to be at or before the index, the implementation will skip over a block when selecting blocks for replication work. In situations when entire racks are decommissioned, leading to many under-replicated blocks, loss of blocks can occur. Background: HDFS-1765 This patch to trunk greatly improved the state of the replication policy implementation. Prior to the patch, the following details were true: * The block priority queue was no such thing: It was really set of trees that held blocks in natural ordering, that being by the blocks ID, which resulted in iterator walks over the blocks in pseudo-random order. * There was only a single index into an iteration over all of the blocks... * ... meaning the implementation was only successful in respecting priority levels on the first pass. Overall, the behavior was a round-robin-type scheduling of blocks. After the patch * A proper priority queue is implemented, preserving log n operations while iterating over blocks in the order added. * A separate index for each priority is key is kept... * ... allowing for processing of the highest priority blocks first regardless of which priority had last been processed. The change was suggested for branch-0.23 as well as trunk, but it does not appear to have been pulled in. The problem: Although the indices are now tracked in a better way, there is a synchronization issue since the indices are managed outside of methods to modify the contents of the queue. Removal of a block from a priority level without adjusting the index can mean that the index then points to the block after the block it originally pointed to. In the next round of scheduling for that priority level, the block originally pointed to by the index is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596369#comment-14596369 ] Colin Patrick McCabe commented on HDFS-8617: Andrew and I actually benchmarked setting {{ioprio}} in order to implement quality of service on the DataNode. It didn't have very much effect. In general, more and more I/O scheduling is moving out of the operating system and into the storage device. Back in the old days, operating systems would feed requests to disks one at a time. Disks took a long time to process requests in those days so it was easy for the CPU to stay well ahead of the disk and basically lead it around by the nose. Nowadays, hard disks have huge on-disk write buffers (several megabytes in size) and internal software that handles draining them. The hard drive doesn't necessarily process requests in the order it gets them. The situation with SSDs is even worse... SSDs have a huge internal layer of firmware that handles servicing any request. In general with SSDs the role of the OS is just to forward requests as quickly as possible to try to keep up with the very fast speed of the SSD. This is why Linux tuning guides tell you to turn your I/O schedule to either {{noop}} or {{deadline}} for best performance on SSDs. Of course, when disks fail, they usually don't fail all at once. Instead, more and more operations start to time out and produce I/O errors. This is problematic for systems like HBase which strive for low latency. That's why we developed workarounds like hedged reads. However, HDFS's checkDirs behavior here is making the situation much worse. For a disk that returns I/O errors every so often, each error may trigger a new full scan of every block file on the datanode. While it's true that these scans just look at the metadata, not the data, they still can put a heavy load on the system. It's pointless to keep rescanning the filesystem continuously when a disk starts returning errors. At the very most, we should rescan only the drive that's failing. And we should not do it continuously, but maybe once every hour or half hour. An HBase sysadmin asked me how to configure this behavior and I had to tell him that we have absolutely no way to do it. bq. I'm unsure whether \[andrew's IOPs calculation\] is the right math. I just checked the code. It looks like checkDir() mostly performs read-only operations on the metadata of the underlying filesystem. The metadata can be fully cached thus the parameter can be way off (and for SSD the parameter needs to be recalculated). That comes back to the point that it is difficult to determine the right parameter for various configuration. The difficulties of finding the parameter leads me to believe that using throttling here is flawed. When your application is latency-sensitive (such as HBase), it makes sense to do a worst-case calculation of how many IOPS per second the workload may generate. While it's true that sometime this may be overly pessimistic if things are cached in memory, it is the right math to do when latency is critical. Throttle DiskChecker#checkDirs() speed. --- Key: HDFS-8617 URL: https://issues.apache.org/jira/browse/HDFS-8617 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8617.000.patch As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is causing excessive I/Os because {{finalizedDirs}} might have up to 64K sub-directories (HDFS-6482). This patch proposes to limit the rate of IO operations in {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596419#comment-14596419 ] Haohui Mai commented on HDFS-7214: -- bq. 1. for suggestion of changing long to AtomicLong, I am not quite sure what the improvement would be. http://psy-lob-saw.blogspot.com/2012/12/atomiclazyset-is-performance-win-for.html bq. 2. I am not sure returning the timestamp is a good idea. Since getNNStarted method returns a Date object and the UI just displays it. I am just simply following that pattern. We cannot change {{getNNStarted}} due to backward compatibility issues. New code should return timestamp. Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596326#comment-14596326 ] Haohui Mai commented on HDFS-7214: -- Thanks for working on this. {code} - + /** the time when the namenode became active */ + private volatile long activeStateStartTime; {code} This is a read-dominant workload. It makes more sense to use {{AtomicLong}} here. You can call {{lazySet()}} to update the timestamp. {code} + @Override // NameNodeStatusMXBean + public String getNNTransitToActiveTime() { +if (activeStateStartTime == 0) { + return N/A; +} +return new Date(activeStateStartTime).toString(); + } + {code} It might be better to return the timestamp and let the UI to format the date, considering locales and timezone issues. Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596372#comment-14596372 ] Siqi Li commented on HDFS-7214: --- [~wheat9], Thanks for your feedback. 1. for suggestion of changing long to AtomicLong, I am not quite sure what the improvement would be. 2. I am not sure returning the timestamp is a good idea. Since getNNStarted method returns a Date object and the UI just displays it. I am just simply following that pattern. Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8644) OzoneHandler : Add volume handler
[ https://issues.apache.org/jira/browse/HDFS-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer reassigned HDFS-8644: -- Assignee: Anu Engineer OzoneHandler : Add volume handler - Key: HDFS-8644 URL: https://issues.apache.org/jira/browse/HDFS-8644 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Add volume handler logic that dispatches volume related calls to the right interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler
[ https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8634: --- Attachment: hdfs-8634-HDFS-7240.001.patch UserAuth Interface and simple Auth handler OzoneHandler: Add userAuth Interface and Simple userAuth handler Key: HDFS-8634 URL: https://issues.apache.org/jira/browse/HDFS-8634 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8634-HDFS-7240.001.patch Add user authentication interface and also the first concrete implementation for that interface called simple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler
[ https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8634: --- Status: Patch Available (was: Open) OzoneHandler: Add userAuth Interface and Simple userAuth handler Key: HDFS-8634 URL: https://issues.apache.org/jira/browse/HDFS-8634 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8634-HDFS-7240.001.patch Add user authentication interface and also the first concrete implementation for that interface called simple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597131#comment-14597131 ] Tsz Wo Nicholas Sze commented on HDFS-8619: --- Patch looks good in general. I agree that we should do most of the changes in trunk. - Just a question, why removing the if-condition below? Is the condition always true? {code} //BlockManager.invalidateBlock(..) -} else if (nr.liveReplicas() = 1) { +} else { {code} - Let's move numCorruptReplicas from BlockManager to BlockManagerTestUtil. - See also if we could move getCorruptReplicaBlockIds from CorruptReplicasMap to BlockManagerTestUtil or some other class in test. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1431) Balancer should work with the logic of BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597133#comment-14597133 ] Ming Ma commented on HDFS-1431: --- When I met with [~andrew.wang], [~ctrezzo], [~atm], [~cmccabe] the other day, we had brief discussion about balancer. To make balancer use BlockPlacementPolicy, alternatively we can run balancer inside namenode. Namenode already has the necessary information. It needs to provide balancer throttling with some refactoring. But overall it seems it shouldn't create much overhead on namenode. It will be great to heard from others about this approach on potential issues such as scale and performance. Balancer should work with the logic of BlockPlacementPolicy --- Key: HDFS-1431 URL: https://issues.apache.org/jira/browse/HDFS-1431 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Attachments: HDFS-1431.txt Currently Balancer does not obtain information from BlockPlacementPolicy so it can transfer the blocks without checking with BlockPlacementPolicy. This causes the policy break after balancing the cluster. There are some new policies proposed in HDFS-1094 and MAPREDUCE-1831 in which the block placement follows some pattern. The pattern can be broken by Balancer. I propose that we add the following method in BlockPlacementPolicy: {code} abstract public boolean canBeMoved(String fileName, Block block, DatanodeInfo source, DatanodeInfo destination); {code} And make Balancer use it in {code} private boolean isGoodBlockCandidate(Source source, BalancerDatanode target, BalancerBlock block) {code} What do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597132#comment-14597132 ] Vinayakumar B commented on HDFS-8578: - bq. Vinayakumar B Have you done any performance benchmarking with this approach? If yes, Could you please post the results here? Nope. I dont have any results. On upgrade, Datanode should process all storage/data dirs in parallel - Key: HDFS-8578 URL: https://issues.apache.org/jira/browse/HDFS-8578 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Raju Bairishetti Priority: Critical Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then datanode which has ~10 disks will take around 3hours to come up. *BlockPoolSliceStorage.java* {code} for (int idx = 0; idx getNumStorageDirs(); idx++) { doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); assert getCTime() == nsInfo.getCTime() : Data-node and name-node CTimes must be the same.; } {code} It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly. Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI
[ https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596981#comment-14596981 ] Ming Ma commented on HDFS-7214: --- [~l201514] you are right that HDFS-7257 has added the metrics for it. Having this information on the webUI is useful. Maybe we can still update webUI based on jmx? Display the time when NN became active on the webUI --- Key: HDFS-7214 URL: https://issues.apache.org/jira/browse/HDFS-7214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, HDFS-7214.v3.patch, HDFS-7214.v4.patch The currently NN webUI displayed JVM start up. It will be useful to show when NN became active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8646) Prune cached replicas from DatanodeDescriptor state on replica invalidation
[ https://issues.apache.org/jira/browse/HDFS-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8646: -- Attachment: hdfs-8646.001.patch Patch attached. We now prune in BlockManager#removeStoredBlock, which should be pretty failsafe. New test exercises this logic, and I also added a failsafe prune in CacheManager in case we missed some other similar case. Prune cached replicas from DatanodeDescriptor state on replica invalidation --- Key: HDFS-8646 URL: https://issues.apache.org/jira/browse/HDFS-8646 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-8646.001.patch Currently we remove blocks from the DD's CachedBlockLists on node failure and on cache report, but not on replica invalidation. This can lead to an invalid situation where we return a LocatedBlock with cached locations that are not backed by an on-disk replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6564) Use slf4j instead of common-logging in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-6564: --- Release Note: Users may need special attention for this change while upgrading to this version. Previously hdfs client was using commons-logging as the logging framework. With this change it will use slf4j framework. For more details about slf4j, please see: http://www.slf4j.org/manual.html. Also, org.apache.hadoop.hdfs.protocol.CachePoolInfo#LOG public static member variable has been removed as it is not used anywhere. Users need to correct their code if any one has a reference to this variable. One can retrieve the named logger via the logging framework of their choice directly like, org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(org.apache.hadoop.hdfs.protocol.CachePoolInfo.class); Use slf4j instead of common-logging in hdfs-client -- Key: HDFS-6564 URL: https://issues.apache.org/jira/browse/HDFS-6564 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-6564-01.patch, HDFS-6564-02.patch, HDFS-6564-03.patch hdfs-client should depends on slf4j instead of common-logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597016#comment-14597016 ] Jesse Yates commented on HDFS-6440: --- Rebased on trunk, tests pass locally for me. Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 3.0.0 Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597130#comment-14597130 ] Vinayakumar B commented on HDFS-8277: - bq. If we cannot make the change for 2.x I prefer not changing the current behavior of failing 'safemode enter' when SBN is down. In case, where SNN is down, may be for maintenance, but available in configuration, going ahead to next namenode on connectexception seems reasonable. To avoid unexpected behavior, may be can add active/standby check for the next namenode before changing the safemode status and can change only if next namenode is active.? Though this is kind of workaround instead of breaking compatibility, IMO proposal as in v1 patch seems reasonable. Any thoughts? Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: Surendra Singh Lilhore Priority: Minor Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597138#comment-14597138 ] Vinayakumar B commented on HDFS-7645: - bq. This change is incompatible since we expose RollingUpgradeInfo in the NN's JMX (a public API). As discussed above, rather than being null on finalization, it now sets the finalization time. Oh! Thanks [~andrew.wang] for pointing out. That was a miss. bq. Have we thought about other ways of solving this issue? Else we can change the JMX method to still return null on finalization. Since DN side wanted to differentiate between FINALIZED rollingupgrade status and rolledback status, Setting the finalizetime on finalization. bq. Else we can change the JMX method to still return null on finalization. We can do this if this fix is backported to stable branches. Currently its only available in branch-2. If not so critical to change it back, then we can add a release note indicating the change. Note that, {{ClientProtocol#rollingUpgrade(..)}} also changed to return non-null finalized status as well. Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Fix For: 2.8.0 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, HDFS-7645.06.patch, HDFS-7645.07.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597142#comment-14597142 ] Zhe Zhang commented on HDFS-8619: - bq. I guess we still need this jira for adding the striped block logic and the tests. Sure. I see at least the tests are specific to striped blocks. bq. Besides, we now have merged quite a few changes to trunk, any plan for merging trunk changes to the HDFS-7285 feature branch? Thanks for bringing up the question Jing. These days I'm mostly focused on this front. Since the merged changes are quite big, I'm rebasing the entire consolidated HDFS-7285 patch instead of individual patches. Late last week I finished a first round of rebasing which is quite rough. I posted a [comment https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14593827page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14593827] but I guess it is hidden in between Hudson messages (could you help delete them?). I'm working on a second round of rebase to include all changes. Hopefully also to split it to functional pieces like Support EC zones, Allocate and persist striped blocks in NameNode, Add striped block support in INodeFile.. I'll post a new rebased patch soon. Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597156#comment-14597156 ] Vinayakumar B commented on HDFS-8586: - Thanks [~brahmareddy] for reporting this. This will come, if the NameNode have the list of deadnodes, and block allocation request comes from the same machine as of DeadNode, then dead node is being chosen as localnode irrespective of whether its part of the cluster or not. Adding one check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} will be the fix for this. Regarding the test proposed above, it will not fail always, since its a minidfscluster test, and all datanodes will be on the same machine And Probabiity of deadnode being chosen as localstorage is not guaranteed. Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596984#comment-14596984 ] Hadoop QA commented on HDFS-6440: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 24 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 3s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 4m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 0s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 32s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 142m 30s | Tests failed in hadoop-hdfs. | | {color:red}-1{color} | hdfs tests | 0m 16s | Tests failed in bkjournal. | | | | 219m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.server.namenode.TestCheckpoint | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl | | Failed build | bkjournal | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740539/hdfs-6440-trunk-v8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_hadoop-hdfs.txt | | bkjournal test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_bkjournal.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11441/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11441/console | This message was automatically generated. Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 3.0.0 Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8646) Prune cached replicas from DatanodeDescriptor state on replica invalidation
Andrew Wang created HDFS-8646: - Summary: Prune cached replicas from DatanodeDescriptor state on replica invalidation Key: HDFS-8646 URL: https://issues.apache.org/jira/browse/HDFS-8646 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Currently we remove blocks from the DD's CachedBlockLists on node failure and on cache report, but not on replica invalidation. This can lead to an invalid situation where we return a LocatedBlock with cached locations that are not backed by an on-disk replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6564) Use slf4j instead of common-logging in hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597023#comment-14597023 ] Rakesh R commented on HDFS-6564: Thanks for the reviews. I've updated {{Release note}} section in the jira. Anything else required for this change. Use slf4j instead of common-logging in hdfs-client -- Key: HDFS-6564 URL: https://issues.apache.org/jira/browse/HDFS-6564 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-6564-01.patch, HDFS-6564-02.patch, HDFS-6564-03.patch hdfs-client should depends on slf4j instead of common-logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8447) Decouple information of files in GetLocatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8447: - Attachment: HDFS-8447.004.patch Decouple information of files in GetLocatedBlocks - Key: HDFS-8447 URL: https://issues.apache.org/jira/browse/HDFS-8447 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8447.000.patch, HDFS-8447.001.patch, HDFS-8447.002.patch, HDFS-8447.003.patch, HDFS-8447.004.patch The current implementation of {{BlockManager.getLocatedBlocks()}} requires the information of files to be passed as parameters. These information does not affect the results of getting the physical locations of blocks. This jira proposes to refactor the call so that {{BlockManager.getLocatedBlocks()}} depends only on the block information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server
[ https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597011#comment-14597011 ] Duo Zhang commented on HDFS-8515: - I've introduced a Http2StreamChannel on the POC branch. https://github.com/Apache9/hadoop/tree/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/http2 Let me extract a patch for it, thanks. Abstract a DTP/2 HTTP/2 server -- Key: HDFS-8515 URL: https://issues.apache.org/jira/browse/HDFS-8515 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Duo Zhang Assignee: Duo Zhang Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, HDFS-8515-v3.patch, HDFS-8515.patch Discussed in HDFS-8471. https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class
[ https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597140#comment-14597140 ] Vinayakumar B commented on HDFS-8493: - bq. Though most of the time the fsd lock is acquired within the fsn lock. BlockManager and LeaseManager only requires the fsn lock but not the fsd lock. We're in the process of cleaning up the locks of both fsn and fsd locks. At the end of the day the NN should be able to process block reports w/o blocking requests to the namespace. Okay. bq. Following are the functions where it has done the resolution fsd.resolvePath(pc, src, pathComponents); by acquiring only fsn lock and not fsd lock. Could you please take a look at it. Thanks [~rakeshr] for listing out those methods. Can you file a follow-up jira to handle those? Consolidate truncate() related implementation in a single class --- Key: HDFS-8493 URL: https://issues.apache.org/jira/browse/HDFS-8493 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch This jira proposes to consolidate truncate() related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597150#comment-14597150 ] Vinayakumar B commented on HDFS-8578: - [~raju.bairishetti], would you mind testing with this patch with your loads? On upgrade, Datanode should process all storage/data dirs in parallel - Key: HDFS-8578 URL: https://issues.apache.org/jira/browse/HDFS-8578 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Raju Bairishetti Priority: Critical Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then datanode which has ~10 disks will take around 3hours to come up. *BlockPoolSliceStorage.java* {code} for (int idx = 0; idx getNumStorageDirs(); idx++) { doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); assert getCTime() == nsInfo.getCTime() : Data-node and name-node CTimes must be the same.; } {code} It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly. Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server
[ https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HDFS-8515: Attachment: HDFS-8515-v4.patch A solution based on AbstractChannel. Abstract a DTP/2 HTTP/2 server -- Key: HDFS-8515 URL: https://issues.apache.org/jira/browse/HDFS-8515 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Duo Zhang Assignee: Duo Zhang Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515.patch Discussed in HDFS-8471. https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8586: --- Status: Patch Available (was: Open) Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8586: --- Attachment: HDFS-8586.patch Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
Ming Ma created HDFS-8647: - Summary: Abstract BlockManager's rack policy into BlockPlacementPolicy Key: HDFS-8647 URL: https://issues.apache.org/jira/browse/HDFS-8647 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Sometimes we want to have namenode use alternative block placement policy such as upgrade domains in HDFS-7541. BlockManager has built-in assumption about rack policy in functions such as useDelHint, blockHasEnoughRacks. That means when we have new block placement policy, we need to modify BlockManager to account for the new policy. Ideally BlockManager should ask BlockPlacementPolicy object instead. That will allow us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class
[ https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597178#comment-14597178 ] Rakesh R commented on HDFS-8493: Yeah, I have raised HDFS-8648 sub-task to revisit these cases and do proper corrections. Consolidate truncate() related implementation in a single class --- Key: HDFS-8493 URL: https://issues.apache.org/jira/browse/HDFS-8493 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Rakesh R Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch This jira proposes to consolidate truncate() related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8648) Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock
Rakesh R created HDFS-8648: -- Summary: Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock Key: HDFS-8648 URL: https://issues.apache.org/jira/browse/HDFS-8648 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R As per the [discussion|https://issues.apache.org/jira/browse/HDFS-8493?focusedCommentId=14595735page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14595735] in HDFS-8493 the function {{FsDirectory#resolvePath}} usage needs to be reviewed. It seems there are many places it has done the resolution {{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and not fsd lock. As per the initial analysis following are such cases, probably it needs to filter out and fix wrong usage. # FsDirAclOp.java - getAclStatus() - modifyAclEntries() - removeAcl() - removeDefaultAcl() - setAcl() - getAclStatus() # FsDirDeleteOp.java - delete(fsn, src, recursive, logRetryCache) # FsDirRenameOp.java - renameToInt(fsd, srcArg, dstArg, logRetryCache) - renameToInt(fsd, srcArg, dstArg, logRetryCache, options) # FsDirStatAndListingOp.java - getContentSummary(fsd, src) - getFileInfo(fsd, srcArg, resolveLink) - isFileClosed(fsd, src) - getListingInt(fsd, srcArg, startAfter, needLocation) # FsDirWriteFileOp.java - abandonBlock() - completeFile(fsn, pc, srcArg, holder, last, fileId) - getEncryptionKeyInfo(fsn, pc, src, supportedVersions) - startFile() - validateAddBlock() # FsDirXAttrOp.java - getXAttrs(fsd, srcArg, xAttrs) - listXAttrs(fsd, src) - setXAttr(fsd, src, xAttr, flag, logRetryCache) # FSNamesystem.java - createEncryptionZoneInt() - getEZForPath() Thanks [~wheat9], [~vinayrpet] for the advice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597196#comment-14597196 ] Brahma Reddy Battula commented on HDFS-8586: [~vinayrpet] thanks a lot for taking a look into this issue.. Added the one check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} and corrected the testcase.. Kindly Review Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)