[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518976#comment-14518976 ] Xinwei Qin commented on HDFS-7836: --- Hi [~cmccabe], [~clamb], This is a very meaningful improvement. Is there any update or next plan about this JIRA? Could you list a summary of the meeting held on March 11th? BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Attachments: BlockManagerScalabilityImprovementsDesign.pdf Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518968#comment-14518968 ] surendra singh lilhore commented on HDFS-8277: -- In {{DFSAdmin.java}} class {{setSafeMode()}} API will iterate on NN proxy list and one by one set NN in safe mode. If first namenode connection is failed then it will come out from loop. {code} for (ProxyAndInfoClientProtocol proxy : proxies) { ClientProtocol haNn = proxy.getProxy(); boolean inSafeMode = haNn.setSafeMode(action, false); if (waitExitSafe) { inSafeMode = waitExitSafeMode(haNn, inSafeMode); } System.out.println(Safe mode is + (inSafeMode ? ON : OFF) + in + proxy.getAddress()); } {code} Here we should catch the connection exception and continue for next NN. Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518786#comment-14518786 ] Hadoop QA commented on HDFS-8214: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 26s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 5s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 164m 49s | Tests failed in hadoop-hdfs. | | | | 212m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728499/HDFS-8214.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 439614b | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10446/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10446/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10446/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10446/console | This message was automatically generated. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, HDFS-8214.003.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore updated HDFS-8277: - Attachment: HDFS-8277_1.patch Attached patch , Please review... Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch, HDFS-8277_1.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore updated HDFS-8277: - Attachment: HDFS-8277.patch Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8259) Erasure Coding: Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8259: - Assignee: Xinwei Qin Erasure Coding: Test of reading EC file --- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8262) Erasure Coding: Test of datanode decommission which EC blocks are stored
[ https://issues.apache.org/jira/browse/HDFS-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8262: - Assignee: Xinwei Qin Erasure Coding: Test of datanode decommission which EC blocks are stored -- Key: HDFS-8262 URL: https://issues.apache.org/jira/browse/HDFS-8262 Project: Hadoop HDFS Issue Type: Test Reporter: GAO Rui Assignee: Xinwei Qin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8260) Erasure Coding: test of writing EC file
[ https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8260: - Assignee: Xinwei Qin Erasure Coding: test of writing EC file Key: HDFS-8260 URL: https://issues.apache.org/jira/browse/HDFS-8260 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin 1. Normally writing EC file(writing without datanote failure) 2. Writing EC file with tolerable number of datanodes failing. 3. Writing EC file with intolerable number of datanodes failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8265) Erasure Coding: Test of Quota calculation for EC files
[ https://issues.apache.org/jira/browse/HDFS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reassigned HDFS-8265: -- Assignee: Rakesh R Erasure Coding: Test of Quota calculation for EC files -- Key: HDFS-8265 URL: https://issues.apache.org/jira/browse/HDFS-8265 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Rakesh R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518982#comment-14518982 ] Yi Liu edited comment on HDFS-8272 at 4/29/15 10:15 AM: Thanks Jing for the work and Zhe for the review ! {code} private int fetchEncryptionKeyTimes = 2; private int fetchTokenTimes = 2; {code} Should them be {{1}}? was (Author: hitliuyi): Thanks Jing for the work and Zhe for the review ! {quote} - if (pos blockEnd || currentNodes == null) { -currentNodes = blockSeekTo(pos); - } +if (pos blockEnd) { + blockSeekTo(pos); +} {quote} We should keep {{currentNodes == null}} ? Otherwise {{blockReaders}} is not initialized? {code} private int fetchEncryptionKeyTimes = 2; private int fetchTokenTimes = 2; {code} Should them be {{1}}? Erasure Coding: simplify the retry logic in DFSStripedInputStream - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-8129: -- Summary: Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding (was: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCode) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519127#comment-14519127 ] Rakesh R commented on HDFS-8220: Thanks a lot [~walter.k.su] for the details. IMHO we could do a validation at the StripedDataStreamer now. Once the basic PlacementPolicyEC is implemented will revisit this case separately in the next phase. [~libo-intel] does this sound good to you. I'm happy to volunteer for this task - support return multiple identical storages. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519125#comment-14519125 ] Uma Maheswara Rao G commented on HDFS-8129: --- I checked the class names. Right now I see below names needs to be modified {noformat} org.apache.hadoop.hdfs.protocol.ECInfo org.apache.hadoop.hdfs.protocol.ECZoneInfo org.apache.hadoop.hdfs.server.namenode.ECSchemaManager {noformat} Please list me if miss some other references to change. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519149#comment-14519149 ] Rakesh R commented on HDFS-8129: I'm adding few more classes for the discussion, please see: {code} org.apache.hadoop.io.erasurecode.ECChunk org.apache.hadoop.io.erasurecode.ECBlockGroup org.apache.hadoop.io.erasurecode.ECBlock org.apache.hadoop.io.erasurecode.ECSchema {code} Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519225#comment-14519225 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519223#comment-14519223 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519227#comment-14519227 ] Hudson commented on HDFS-8204: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519230#comment-14519230 ] Uma Maheswara Rao G commented on HDFS-8129: --- Actually Nicholas quoted above that, for the classes already inside erasurecode package, it may be good enough to keep as is ( ECXXX). We can discuss further if others does not agree on it. That was the reason I did not picked up that classes into my list before. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519251#comment-14519251 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519253#comment-14519253 ] Hudson commented on HDFS-8204: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519256#comment-14519256 ] Rakesh R commented on HDFS-8129: oops, thanks Uma for pointing out this. Probably it can be used to reach to a conclusion (EcXxx) or (ECXxx) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7847: --- Status: Patch Available (was: Reopened) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519199#comment-14519199 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519197#comment-14519197 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519201#comment-14519201 ] Hudson commented on HDFS-8204: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb reopened HDFS-7847: Porting to trunk. .004 submitted. Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519248#comment-14519248 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519263#comment-14519263 ] Kai Zheng commented on HDFS-8129: - Thanks for the discussion. It would be great if we could keep those names in the codec coder framework. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519283#comment-14519283 ] Hudson commented on HDFS-8204: -- FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/912/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519281#comment-14519281 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/912/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519279#comment-14519279 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/912/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519304#comment-14519304 ] Charles Lamb commented on HDFS-8214: The test failure is unrelated. The checkstyle issue has already been discussed above. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, HDFS-8214.003.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7847: --- Target Version/s: 2.8.0 (was: HDFS-7836) Status: In Progress (was: Patch Available) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8015: Attachment: HDFS-8015-001.patch Erasure Coding: local and remote block writer for coding work in DataNode - Key: HDFS-8015 URL: https://issues.apache.org/jira/browse/HDFS-8015 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Li Bo Attachments: HDFS-8015-000.patch, HDFS-8015-001.patch As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure coding, to perform encoding or decoding, we need to be able to write data blocks locally or remotely. This is to come up block writer facility in DataNode side. Better to think about the similar work done in client side, so in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518754#comment-14518754 ] Zhe Zhang commented on HDFS-7859: - [~aw] Thanks again for bringing in the feature-branch pre-commit Jenkins functionality! It's really helpful. We just saw another successful run under HDFS-7678. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8276) LazyPersistFileScrubber should be disabled if scrubber interval configured zero
[ https://issues.apache.org/jira/browse/HDFS-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518975#comment-14518975 ] surendra singh lilhore commented on HDFS-8276: -- Attached patch, Please review LazyPersistFileScrubber should be disabled if scrubber interval configured zero --- Key: HDFS-8276 URL: https://issues.apache.org/jira/browse/HDFS-8276 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Attachments: HDFS-8276.patch bq. but I think it is simple enough to change the meaning of the value so that zero means 'never scrub'. Let me post an updated patch. As discussed in [HDFS-6929|https://issues.apache.org/jira/browse/HDFS-6929], scrubber should be disable if *dfs.namenode.lazypersist.file.scrub.interval.sec* is zero. Currently namenode startup is failing if interval configured zero {code} 2015-04-27 23:47:31,744 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.lang.IllegalArgumentException: dfs.namenode.lazypersist.file.scrub.interval.sec must be non-zero. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:828) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518771#comment-14518771 ] Yongjun Zhang commented on HDFS-7281: - Hi [~mingma], Thanks for the new rev and the release notes! One minor thing, I found three lines touched by this patch exceed 80 chars: * Line 852 of BlockManager * Line 575 and 703 of NamenodeFsck Really sorry I did not catch them in earlier rounds. +1 after that and jenkins. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281-5.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518982#comment-14518982 ] Yi Liu commented on HDFS-8272: -- Thanks Jing for the work and Zhe for the review ! {quote} - if (pos blockEnd || currentNodes == null) { -currentNodes = blockSeekTo(pos); - } +if (pos blockEnd) { + blockSeekTo(pos); +} {quote} We should keep {{currentNodes == null}} ? Otherwise {{blockReaders}} is not initialized? {code} private int fetchEncryptionKeyTimes = 2; private int fetchTokenTimes = 2; {code} Should them be {{1}}? Erasure Coding: simplify the retry logic in DFSStripedInputStream - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518992#comment-14518992 ] Brahma Reddy Battula commented on HDFS-8277: [~surendrasingh] Thanks for working on this.. Patch LGTM,+1 ( non binding)... Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch, HDFS-8277_1.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518931#comment-14518931 ] Walter Su commented on HDFS-8220: - ...In your unit test, the cluster has 3 nodes and only 3 locations are returned by NN for RS(6,3). Currently, PlacementPolicy doesn't support return two identical storages. It even doesn't support two identical DNs. Does BlockInfo.addStorage()/removeStorage() function well when two identical storages exists? Currently, normal block doesn't support this, how about EC BlockGroup doesn't support this temperately? We can discuss it in the future work when HDFS-7285 is done. HDFS-7613 handles the situation when short of racks, it doesn't handle when short of nodes. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8150: - Attachment: HDFS-8150.2.patch Attaching the patch after fixing testcase failure. Please review. Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Attachments: HDFS-8150.1.patch, HDFS-8150.2.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518935#comment-14518935 ] Li Bo commented on HDFS-7348: - We can give two working modes to the recovery work, Fast, or Slow. For slow mode, we read blocks(or cells) in sequence for a decode calculation, then write to disk. After finished, we send blocks one by one. For fast mode, we read blocks in parallel and directly sends them to destinations without storing on local disk. The selection of mode is based on factors such as the network status, the burden of datanode who takes the recovery task, etc . Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8276) LazyPersistFileScrubber should be disabled if scrubber interval configured zero
[ https://issues.apache.org/jira/browse/HDFS-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore updated HDFS-8276: - Attachment: HDFS-8276.patch LazyPersistFileScrubber should be disabled if scrubber interval configured zero --- Key: HDFS-8276 URL: https://issues.apache.org/jira/browse/HDFS-8276 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Attachments: HDFS-8276.patch bq. but I think it is simple enough to change the meaning of the value so that zero means 'never scrub'. Let me post an updated patch. As discussed in [HDFS-6929|https://issues.apache.org/jira/browse/HDFS-6929], scrubber should be disable if *dfs.namenode.lazypersist.file.scrub.interval.sec* is zero. Currently namenode startup is failing if interval configured zero {code} 2015-04-27 23:47:31,744 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.lang.IllegalArgumentException: dfs.namenode.lazypersist.file.scrub.interval.sec must be non-zero. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:828) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2484) checkLease should throw FileNotFoundException when file does not exist
[ https://issues.apache.org/jira/browse/HDFS-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519015#comment-14519015 ] Rakesh R commented on HDFS-2484: Thanks [~shv] for reporting this. I could see {{FSNamesystem#checkLease}} has the following checks in branch-2 and trunk, does this validation satisfy your case? . [FSNamesystem#checkLease logic|https://github.com/apache/hadoop/blob/branch-2.7/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3417] {code} if (inode == null) { Lease lease = leaseManager.getLease(holder); throw new LeaseExpiredException( No lease on + ident + : File does not exist. + (lease != null ? lease.toString() : Holder + holder + does not have any open files.)); } {code} checkLease should throw FileNotFoundException when file does not exist -- Key: HDFS-2484 URL: https://issues.apache.org/jira/browse/HDFS-2484 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.22.0, 2.0.0-alpha Reporter: Konstantin Shvachko When file is deleted during its creation {{FSNamesystem.checkLease(String src, String holder)}} throws {{LeaseExpiredException}}. It would be more informative if it thrown {{FileNotFoundException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8277: --- Issue Type: Bug (was: Improvement) Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch, HDFS-8277_1.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups
[ https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518818#comment-14518818 ] Li Bo commented on HDFS-7613: - hi, Walter HDFS-8220 depends upon this sub task but the current patch can’t be applied. Could you update your patch according to current code so that HDFS-8220 can go on based your new patch? Thanks. Block placement policy for erasure coding groups Key: HDFS-7613 URL: https://issues.apache.org/jira/browse/HDFS-7613 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Walter Su Attachments: HDFS-7613.001.patch Blocks in an erasure coding group should be placed in different failure domains -- different DataNodes at the minimum, and different racks ideally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518817#comment-14518817 ] Hadoop QA commented on HDFS-8269: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 29s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 16s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 165m 26s | Tests failed in hadoop-hdfs. | | | | 213m 25s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729005/HDFS-8269.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 439614b | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10447/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10447/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10447/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10447/console | This message was automatically generated. getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch, HDFS-8269.003.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518918#comment-14518918 ] Li Bo commented on HDFS-7348: - Thanks Yi for the great work! I think decreasing network I/O is very important for a cluster using EC, for read part, we may read blocks in sequence; for write part, we may first write decoded blocks to local disk and then send them to remote datanodes. This may slow the recovery work, but we reduce the impaction to network I/O especially when the cluster is busy. For reader and writer, I think we may separate them out as independent classes so that other tasks can also use them. Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore updated HDFS-8277: - Status: Patch Available (was: Open) Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor Attachments: HDFS-8277.patch, HDFS-8277_1.patch HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8265) Erasure Coding: Test of Quota calculation for EC files
[ https://issues.apache.org/jira/browse/HDFS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519108#comment-14519108 ] Rakesh R commented on HDFS-8265: Thanks [~demongaorui] for reporting this task. I'm happy to volunteer and make an attempt. Please feel free to reassign if you have started with this. Erasure Coding: Test of Quota calculation for EC files -- Key: HDFS-8265 URL: https://issues.apache.org/jira/browse/HDFS-8265 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Rakesh R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519571#comment-14519571 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519573#comment-14519573 ] Hudson commented on HDFS-8204: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519569#comment-14519569 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8272: Summary: Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read) (was: Erasure Coding: simplify the retry logic in DFSStripedInputStream) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read) - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.
[ https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin reassigned HDFS-1950: Assignee: ramtin (was: Uma Maheswara Rao G) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly. Key: HDFS-1950 URL: https://issues.apache.org/jira/browse/HDFS-1950 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 0.20.205.0 Reporter: ramkrishna.s.vasudevan Assignee: ramtin Priority: Blocker Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt Before going to the root cause lets see the read behavior for a file having more than 10 blocks in append case.. Logic: There is prefetch size dfs.read.prefetch.size for the DFSInputStream which has default value of 10 This prefetch size is the number of blocks that the client will fetch from the namenode for reading a file.. For example lets assume that a file X having 22 blocks is residing in HDFS The reader first fetches first 10 blocks from the namenode and start reading After the above step , the reader fetches the next 10 blocks from NN and continue reading Then the reader fetches the remaining 2 blocks from NN and complete the write Cause: === Lets see the cause for this issue now... Scenario that will fail is Writer wrote 10+ blocks and a partial block and called sync. Reader trying to read the file will not get the last partial block . Client first gets the 10 block locations from the NN. Now it checks whether the file is under construction and if so it gets the size of the last partial block from datanode and reads the full file However when the number of blocks is more than 10, the last block will not be in the first fetch. It will be in the second or other blocks(last block will be in (num of blocks / 10)th fetch) The problem now is, in DFSClient there is no logic to get the size of the last partial block(as in case of point 1), for the rest of the fetches other than first fetch, the reader will not be able to read the complete data synced...!! also the InputStream.available api uses the first fetched block size to iterate. Ideally this size has to be increased -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7281: -- Attachment: HDFS-7281-6.patch Thanks [~yzhangal]. Here is the updated patch. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281-5.patch, HDFS-7281-6.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8272: Attachment: HDFS-8272.002.patch Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 branch. Upload the patch excluding the changes in DFSInputStream. Also fix the bug pointed out by Yi. About the {{seekToBlockSource}}, I think it may be better to remove it by now: # With decoding functionality we do not need to spend more time trying our luck on the same DN. # Currently calling {{seekToBlockSource}} will cause all the current block readers to be closed (since {{blockSeekTo}} will call {{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so as to make sure only one block reader is retried. Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read) - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519802#comment-14519802 ] Jing Zhao commented on HDFS-8283: - Thanks for working on this, Nicholas! The patch looks pretty good to me. The test failure should be unrelated and it passed in my local run. +1 I will commit the patch shortly. DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8283: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.8.0 Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8288) Refactor DFSStripedOutputStream and StripedDataStreamer
Tsz Wo Nicholas Sze created HDFS-8288: - Summary: Refactor DFSStripedOutputStream and StripedDataStreamer Key: HDFS-8288 URL: https://issues.apache.org/jira/browse/HDFS-8288 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze - DFSStripedOutputStream has a list of StripedDataStreamer(s). The streamers share a data structure ListBlockingQueueLocatedBlock stripeBlocks for communicate located block and end block information. For example, {code} //StripedDataStreamer.endBlock() // before retrieving a new block, transfer the finished block to // leading streamer LocatedBlock finishedBlock = new LocatedBlock( new ExtendedBlock(block.getBlockPoolId(), block.getBlockId(), block.getNumBytes(), block.getGenerationStamp()), null); try { boolean offSuccess = stripedBlocks.get(0).offer(finishedBlock, 30, TimeUnit.SECONDS); {code} It is unnecessary to create a LocatedBlock object for an end block since the locations passed is null. Also, the return value is ignored (i.e. offSuccess is not used). - DFSStripedOutputStream has another data structure cellBuffers for computing parity. It should be refactored to a class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520002#comment-14520002 ] Colin Patrick McCabe commented on HDFS-8113: +1 for HDFS-8113.02.patch. I think it's a good robustness improvement to the code. It would be nice to continue the investigation about why you hit this issue in another jira, as [~chengbing.liu] suggested. NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520056#comment-14520056 ] Haohui Mai commented on HDFS-8214: -- {code} + if (v 0) { +return unknown; + } {code} It might make more sense to move it to the template (i.e., {{status.html}}), as the function might later be superseded by moment.js. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, HDFS-8214.003.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520064#comment-14520064 ] Haohui Mai commented on HDFS-8037: -- Good catch. Can you please add a unit test? WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: easyfix, newbie Attachments: HDFS-8037.001.patch, HDFS-8037.002.patch WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520083#comment-14520083 ] Lei (Eddy) Xu commented on HDFS-7758: - Seems that the checkstyle error is caused by HADOOP-11889. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520095#comment-14520095 ] Jing Zhao edited comment on HDFS-8272 at 4/29/15 8:03 PM: -- Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 branch. Upload the patch excluding the changes in DFSInputStream. Also fix the bug pointed out by Yi. About the {{seekToBlockSource}}, I think it may be better to remove it: # With decoding functionality we do not need to spend more time trying our luck on the same DN. # Currently calling {{seekToBlockSource}} will cause all the current block readers to be closed (since {{blockSeekTo}} will call {{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so as to make sure only one block reader is retried. How about removing it by now and we can add it back if necessary in the future? was (Author: jingzhao): Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 branch. Upload the patch excluding the changes in DFSInputStream. Also fix the bug pointed out by Yi. About the {{seekToBlockSource}}, I think it may be better to remove it by now: # With decoding functionality we do not need to spend more time trying our luck on the same DN. # Currently calling {{seekToBlockSource}} will cause all the current block readers to be closed (since {{blockSeekTo}} will call {{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so as to make sure only one block reader is retried. Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read) - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8286) Scaling out the namespace using KV store
Haohui Mai created HDFS-8286: Summary: Scaling out the namespace using KV store Key: HDFS-8286 URL: https://issues.apache.org/jira/browse/HDFS-8286 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Currently the NN keeps the namespace in the memory. To improve the scalability of the namespace, users can scale up by using more RAM or scale out using Federation (i.e., statically partitioning the namespace). We would like to remove the limitation of scaling the global namespace. Our vision is that that HDFS should adopt a scalable underlying architecture that allows the global namespace scales linearly. We propose to implement the HDFS namespace on top of a key-value (KV) store. Adopting the KV store interfaces allows HDFS to leverage the capability of modern KV store and to become much easier to scale. Going forward, the architecture allows distributing the namespace across multiple machines, or storing only the working set in the memory (HDFS-5389), both of which allows HDFS to manage billions of files using the commodity hardware available today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.
[ https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated HDFS-1950: - Assignee: (was: ramtin) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly. Key: HDFS-1950 URL: https://issues.apache.org/jira/browse/HDFS-1950 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 0.20.205.0 Reporter: ramkrishna.s.vasudevan Priority: Blocker Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt Before going to the root cause lets see the read behavior for a file having more than 10 blocks in append case.. Logic: There is prefetch size dfs.read.prefetch.size for the DFSInputStream which has default value of 10 This prefetch size is the number of blocks that the client will fetch from the namenode for reading a file.. For example lets assume that a file X having 22 blocks is residing in HDFS The reader first fetches first 10 blocks from the namenode and start reading After the above step , the reader fetches the next 10 blocks from NN and continue reading Then the reader fetches the remaining 2 blocks from NN and complete the write Cause: === Lets see the cause for this issue now... Scenario that will fail is Writer wrote 10+ blocks and a partial block and called sync. Reader trying to read the file will not get the last partial block . Client first gets the 10 block locations from the NN. Now it checks whether the file is under construction and if so it gets the size of the last partial block from datanode and reads the full file However when the number of blocks is more than 10, the last block will not be in the first fetch. It will be in the second or other blocks(last block will be in (num of blocks / 10)th fetch) The problem now is, in DFSClient there is no logic to get the size of the last partial block(as in case of point 1), for the rest of the fetches other than first fetch, the reader will not be able to read the complete data synced...!! also the InputStream.available api uses the first fetched block size to iterate. Ideally this size has to be increased -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8249) Separate HdfsConstants into the client and the server side class
[ https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8249: - Attachment: HDFS-8249.001.patch Separate HdfsConstants into the client and the server side class Key: HDFS-8249 URL: https://issues.apache.org/jira/browse/HDFS-8249 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch The constants in {{HdfsConstants}} are used by both the client side and the server side. There are two types of constants in the class: 1. Constants that are used internally by the servers or not part of the APIs. These constants are free to evolve without breaking compatibilities. For example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the path does not go too long. Developers are free to change the name of the constants and to move it around if necessary. 1. Constants that are used by the clients, but not parts of the APIs. For example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part of the wire protocol but the value is not. Developers are free to rename the constants but are not allowed to change the value of the constants. 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is used in {{DistributedFileSystem}}. Changing the name / value of the constant will break binary compatibility, but not source code compatibility. This jira proposes to separate the above three types of constants into different classes: * Creating a new class {{HdfsConstantsServer}} to hold the first type of constants. * Move {{HdfsConstants}} into the {{hdfs-client}} package. The work of separating the second and the third types of constants will be postponed in a separate jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.
[ https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated HDFS-1950: - Assignee: Uma Maheswara Rao G Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly. Key: HDFS-1950 URL: https://issues.apache.org/jira/browse/HDFS-1950 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 0.20.205.0 Reporter: ramkrishna.s.vasudevan Assignee: Uma Maheswara Rao G Priority: Blocker Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt Before going to the root cause lets see the read behavior for a file having more than 10 blocks in append case.. Logic: There is prefetch size dfs.read.prefetch.size for the DFSInputStream which has default value of 10 This prefetch size is the number of blocks that the client will fetch from the namenode for reading a file.. For example lets assume that a file X having 22 blocks is residing in HDFS The reader first fetches first 10 blocks from the namenode and start reading After the above step , the reader fetches the next 10 blocks from NN and continue reading Then the reader fetches the remaining 2 blocks from NN and complete the write Cause: === Lets see the cause for this issue now... Scenario that will fail is Writer wrote 10+ blocks and a partial block and called sync. Reader trying to read the file will not get the last partial block . Client first gets the 10 block locations from the NN. Now it checks whether the file is under construction and if so it gets the size of the last partial block from datanode and reads the full file However when the number of blocks is more than 10, the last block will not be in the first fetch. It will be in the second or other blocks(last block will be in (num of blocks / 10)th fetch) The problem now is, in DFSClient there is no logic to get the size of the last partial block(as in case of point 1), for the rest of the fetches other than first fetch, the reader will not be able to read the complete data synced...!! also the InputStream.available api uses the first fetched block size to iterate. Ideally this size has to be increased -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519769#comment-14519769 ] Jing Zhao commented on HDFS-8270: - I think we can use HDFS-6697 to make the lease soft and hard limits configurable, and make retry times configurable as well. create() always retried with hardcoded timeout when file already exists --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519763#comment-14519763 ] Lei (Eddy) Xu commented on HDFS-7758: - working on fixing checkstyle reports. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519834#comment-14519834 ] Jing Zhao commented on HDFS-8269: - The 003 patch looks pretty good to me. +1. The failed test TestPipelinesFailover should be unrelated. getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch, HDFS-8269.003.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519827#comment-14519827 ] Hudson commented on HDFS-8283: -- FAILURE: Integrated in Hadoop-trunk-Commit #7699 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7699/]) HDFS-8283. DataStreamer cleanup and some minor improvement. Contributed by Tsz Wo Nicholas Sze. (jing9: rev 7947e5b53b9ac9524b535b0384c1c355b74723ff) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/MultipleIOException.java DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.8.0 Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518497#comment-14518497 ] Haohui Mai edited comment on HDFS-8273 at 4/29/15 6:14 PM: --- I've committed the patch to trunk, branch-2 and branch-2.7. Thanks Jing for the reviews. was (Author: wheat9): I've committed the patch to trunk and branch-2. Thanks Jing for the reviews. FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8269: - Resolution: Fixed Fix Version/s: 2.7.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk, branch-2 and branch-2.7. Thanks Jing for the reviews. getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch, HDFS-8269.003.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519880#comment-14519880 ] Hudson commented on HDFS-8269: -- FAILURE: Integrated in Hadoop-trunk-Commit #7700 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7700/]) HDFS-8269. getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime. Contributed by Haohui Mai. (wheat9: rev 3dd6395bb2448e5b178a51c864e3c9a3d12e8bc9) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGetBlockLocations.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch, HDFS-8269.003.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8286) Scaling out the namespace using KV store
[ https://issues.apache.org/jira/browse/HDFS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8286: - Attachment: hdfs-kv-design.pdf The attachment outlines the architecture of the HDFS namespace over KV store. It describes how to encode the current namespace into KV schema, and how to implement existing features such as HA and snapshot under the proposed architecture. One thing worth noting is that in the proposed design HDFS still keeps the namespace in the memory to smoothen the migration. What it means is that the implementation will be based on an in-memory KV store. Our preliminary evaluations of our prototype show that the architecture has comparable memory usage and performance w.r.t. HDFS today. This jira can be seen as the Phase I implementation of HDFS-5389. In this jira we plan to focus on faithfully implementing the features that are available in HDFS today, and focusing on migrating from this architecture toward HDFS-5389 in a later phase of implementation. Scaling out the namespace using KV store Key: HDFS-8286 URL: https://issues.apache.org/jira/browse/HDFS-8286 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: hdfs-kv-design.pdf Currently the NN keeps the namespace in the memory. To improve the scalability of the namespace, users can scale up by using more RAM or scale out using Federation (i.e., statically partitioning the namespace). We would like to remove the limitation of scaling the global namespace. Our vision is that that HDFS should adopt a scalable underlying architecture that allows the global namespace scales linearly. We propose to implement the HDFS namespace on top of a key-value (KV) store. Adopting the KV store interfaces allows HDFS to leverage the capability of modern KV store and to become much easier to scale. Going forward, the architecture allows distributing the namespace across multiple machines, or storing only the working set in the memory (HDFS-5389), both of which allows HDFS to manage billions of files using the commodity hardware available today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
Tsz Wo Nicholas Sze created HDFS-8287: - Summary: DFSStripedOutputStream.writeChunk should not wait for writing parity Key: HDFS-8287 URL: https://issues.apache.org/jira/browse/HDFS-8287 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze When a stripping cell is full, writeChunk computes and generates parity packets. It sequentially calls waitAndQueuePacket so that user client cannot continue to write data until it finishes. We should allow user client to continue writing instead but not blocking it when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8249) Separate HdfsConstants into the client and the server side class
[ https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520261#comment-14520261 ] Hadoop QA commented on HDFS-8249: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 37 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 51s | The applied patch generated 22 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 12s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 165m 51s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 16s | Tests passed in hadoop-hdfs-client. | | {color:green}+1{color} | hdfs tests | 1m 41s | Tests passed in hadoop-hdfs-nfs. | | {color:green}+1{color} | hdfs tests | 4m 2s | Tests passed in bkjournal. | | | | 222m 2s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729230/HDFS-8249.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8f82970 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | hadoop-hdfs-nfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs-nfs.txt | | bkjournal test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_bkjournal.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10453/console | This message was automatically generated. Separate HdfsConstants into the client and the server side class Key: HDFS-8249 URL: https://issues.apache.org/jira/browse/HDFS-8249 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch The constants in {{HdfsConstants}} are used by both the client side and the server side. There are two types of constants in the class: 1. Constants that are used internally by the servers or not part of the APIs. These constants are free to evolve without breaking compatibilities. For example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the path does not go too long. Developers are free to change the name of the constants and to move it around if necessary. 1. Constants that are used by the clients, but not parts of the APIs. For example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part of the wire protocol but the value is not. Developers are free to rename the constants but are not allowed to change the value of the constants. 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is used in {{DistributedFileSystem}}. Changing the name / value of the constant will break binary compatibility, but not source code compatibility. This jira proposes to separate the above three types of constants into different classes: *
[jira] [Created] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.
Chris Nauroth created HDFS-8290: --- Summary: WebHDFS calls before namesystem initialization can cause NullPointerException. Key: HDFS-8290 URL: https://issues.apache.org/jira/browse/HDFS-8290 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor The NameNode has a brief window of time when the HTTP server has been initialized, but the namesystem has not been initialized. During this window, a WebHDFS call can cause a {{NullPointerException}}. We can catch this condition and return a more meaningful error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6697) Make NN lease soft and hard limits configurable
[ https://issues.apache.org/jira/browse/HDFS-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520194#comment-14520194 ] Haohui Mai commented on HDFS-6697: -- What about changing the values through reflection? It should not be exposed in the configuration which might lead to compatibility concerns. Make NN lease soft and hard limits configurable --- Key: HDFS-6697 URL: https://issues.apache.org/jira/browse/HDFS-6697 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma For testing, NameNodeAdapter allows test code to specify lease soft and hard limit via setLeasePeriod directly on LeaseManager. But NamenodeProxies.java still use the default values. It is useful if we can make NN lease soft and hard limit configurable via Configuration. That will allow NamenodeProxies.java to use the configured values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8289) DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo
Tsz Wo Nicholas Sze created HDFS-8289: - Summary: DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo Key: HDFS-8289 URL: https://issues.apache.org/jira/browse/HDFS-8289 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze {code} // ECInfo is restored from NN just before writing striped files. ecInfo = dfsClient.getErasureCodingInfo(src); {code} The rpc call above can be avoided by adding ECSchema to HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated HDFS-7559: - Labels: BB2015-05-TBR supportability (was: supportability) Create unit test to automatically compare HDFS related classes and hdfs-default.xml --- Key: HDFS-7559 URL: https://issues.apache.org/jira/browse/HDFS-7559 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: BB2015-05-TBR, supportability Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, HDFS-7559.003.patch, HDFS-7559.004.patch Create a unit test that will automatically compare the fields in the various HDFS related classes and hdfs-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520301#comment-14520301 ] Ray Chiang commented on HDFS-7559: -- Both failed tests pass in my tree. Create unit test to automatically compare HDFS related classes and hdfs-default.xml --- Key: HDFS-7559 URL: https://issues.apache.org/jira/browse/HDFS-7559 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, HDFS-7559.003.patch, HDFS-7559.004.patch Create a unit test that will automatically compare the fields in the various HDFS related classes and hdfs-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.
[ https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-8290: Attachment: HDFS-8290.001.patch I'm attaching a patch that adds a null check and a test. WebHDFS calls before namesystem initialization can cause NullPointerException. -- Key: HDFS-8290 URL: https://issues.apache.org/jira/browse/HDFS-8290 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-8290.001.patch The NameNode has a brief window of time when the HTTP server has been initialized, but the namesystem has not been initialized. During this window, a WebHDFS call can cause a {{NullPointerException}}. We can catch this condition and return a more meaningful error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.
[ https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-8290: Status: Patch Available (was: Open) WebHDFS calls before namesystem initialization can cause NullPointerException. -- Key: HDFS-8290 URL: https://issues.apache.org/jira/browse/HDFS-8290 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-8290.001.patch The NameNode has a brief window of time when the HTTP server has been initialized, but the namesystem has not been initialized. During this window, a WebHDFS call can cause a {{NullPointerException}}. We can catch this condition and return a more meaningful error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.
[ https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520358#comment-14520358 ] Jakob Homan commented on HDFS-8290: --- +1. WebHDFS calls before namesystem initialization can cause NullPointerException. -- Key: HDFS-8290 URL: https://issues.apache.org/jira/browse/HDFS-8290 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-8290.001.patch The NameNode has a brief window of time when the HTTP server has been initialized, but the namesystem has not been initialized. During this window, a WebHDFS call can cause a {{NullPointerException}}. We can catch this condition and return a more meaningful error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8291) Modify NN WebUI to display correct unit
[ https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8291 started by Zhongyi Xie. - Modify NN WebUI to display correct unit Key: HDFS-8291 URL: https://issues.apache.org/jira/browse/HDFS-8291 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhongyi Xie Assignee: Zhongyi Xie Priority: Minor NN Web UI displays its capacity and usage in TB, but it is actually TiB. We should either change the unit name or the calculation to ensure it follows standards. http://en.wikipedia.org/wiki/Tebibyte http://en.wikipedia.org/wiki/Terabyte -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8174) Update replication count to live rep count in fsck report
[ https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520453#comment-14520453 ] Ming Ma commented on HDFS-8174: --- Thanks [~andreina]. LGTM. Update replication count to live rep count in fsck report - Key: HDFS-8174 URL: https://issues.apache.org/jira/browse/HDFS-8174 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Attachments: HDFS-8174.1.patch When one of the replica is decommissioned , fetching fsck report gives repl count is one less than the total replica information displayed. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} Update the description from rep to Live_rep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Attachment: (was: HDFS-7678-HDFS-7285.004.patch) Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678-HDFS-7285.003.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Attachment: HDFS-7678-HDFS-7285.004.patch Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8249) Separate HdfsConstants into the client and the server side class
[ https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8249: - Attachment: HDFS-8249.002.patch Separate HdfsConstants into the client and the server side class Key: HDFS-8249 URL: https://issues.apache.org/jira/browse/HDFS-8249 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch, HDFS-8249.002.patch The constants in {{HdfsConstants}} are used by both the client side and the server side. There are two types of constants in the class: 1. Constants that are used internally by the servers or not part of the APIs. These constants are free to evolve without breaking compatibilities. For example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the path does not go too long. Developers are free to change the name of the constants and to move it around if necessary. 1. Constants that are used by the clients, but not parts of the APIs. For example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part of the wire protocol but the value is not. Developers are free to rename the constants but are not allowed to change the value of the constants. 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is used in {{DistributedFileSystem}}. Changing the name / value of the constant will break binary compatibility, but not source code compatibility. This jira proposes to separate the above three types of constants into different classes: * Creating a new class {{HdfsConstantsServer}} to hold the first type of constants. * Move {{HdfsConstants}} into the {{hdfs-client}} package. The work of separating the second and the third types of constants will be postponed in a separate jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520589#comment-14520589 ] Zhe Zhang commented on HDFS-7348: - Thanks for the discussion Yi and Bo. On the write path: # I wonder if we should have a fast track for the most common case, where the DN receiving EC command is the final destination? In this case, this DN should just create a local block and write to it. # If we decide to have such a fast track, then it seems natural to use that code to store a copy of all reconstructed blocks first. Then we can use existing {{DataNode#DataTransfer}} to push them out. Yi mentioned several drawbacks of storing a reconstructed block on disk before sending it out: i) performance; ii) disk space; iii) management; iv) calculate crc. The performance and disk usage overheads are still valid concerns even if we have a fast track code mentioned above. So how about split out the current logic of transferring to remote targets (e.g., {{transferCells2Targets}}) as a separate JIRA (recovering multiple missing blocks)? Of course that's assuming we do want to have a fast track for recovering single block locally. On the read path: # bq. (read entire blocks and then decode) It's big issue for memory, especially there may be multiple stripe block recovery at the same time. Yes I agree.. So block size is too large as the sync-and-decode unit and I think cell size is too small for that purpose. I think it's reasonable to use a few 100MB's of memory for recovery. So how about setting the default as 32MB or 64MB? Assuming 6+3 schema that will be 300~600MB of memory usage. And we only need to create block reader 2~4 times to each source. # Sequential vs. parallel reading is a hard decision. Since the current code is in parallel mode we should probably keep it that way in this stage, and add the other mode (like Bo suggested, Fast and Slow modes) later if needed. Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520594#comment-14520594 ] Hadoop QA commented on HDFS-7281: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 26s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 7s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 226m 22s | Tests failed in hadoop-hdfs. | | | | 272m 36s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Class org.apache.hadoop.hdfs.DataStreamer$LastException is not derived from an Exception, even though it is named as such At DataStreamer.java:from an Exception, even though it is named as such At DataStreamer.java:[lines 177-201] | | Failed unit tests | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation | | | hadoop.hdfs.TestFileLengthOnClusterRestart | | | hadoop.cli.TestHDFSCLI | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestQuota | | | hadoop.hdfs.TestClose | | | hadoop.hdfs.TestMultiThreadedHflush | | | hadoop.hdfs.server.datanode.TestBlockRecovery | | | hadoop.hdfs.TestDFSOutputStream | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | org.apache.hadoop.hdfs.TestDataTransferProtocol | | | org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729274/HDFS-7281-6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3dd6395 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/checkstyle-result-diff.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10454/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10454/console | This message was automatically generated. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281-5.patch, HDFS-7281-6.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520401#comment-14520401 ] Ravi Prakash commented on HDFS-7342: I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. I am going to check what happens when only the primary datanode is shot. {color:red}Please let me know if I shouldn't hijack this JIRA. By default I will{color} {code:title=TestHadoop.java|borderStyle=solid} import java.io.BufferedWriter; import java.io.IOException; import java.io.OutputStreamWriter; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class TestHadoop { public static void main(String args[]) throws IOException, InterruptedException { Path path = new Path(/tmp/testHadoop); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); System.out.println(DefaultFS: + conf.get(fs.defaultFS)); System.out.flush(); FSDataOutputStream hdfsout = fs.create(path,true); BufferedWriter br=new BufferedWriter(new OutputStreamWriter(hdfsout)); System.out.println(Created the bufferedWriter ); System.out.flush(); br.write(Some string); br.flush(); hdfsout.hflush(); System.out.println(Wrote to the bufferedWriter ); System.out.flush(); Thread.sleep(12); //KILL THE PROCESS DURING THIS SLEEP br.close(); System.out.println(Closed the bufferedWriter ); System.out.flush(); } } {code} Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-7342: --- Target Version/s: 2.8.0 (was: 2.6.1) Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8291) Modify NN WebUI to display correct unit
Zhongyi Xie created HDFS-8291: - Summary: Modify NN WebUI to display correct unit Key: HDFS-8291 URL: https://issues.apache.org/jira/browse/HDFS-8291 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhongyi Xie Assignee: Zhongyi Xie Priority: Minor NN Web UI displays its capacity and usage in TB, but it is actually TiB. We should either change the unit name or the calculation to ensure it follows standards. http://en.wikipedia.org/wiki/Tebibyte http://en.wikipedia.org/wiki/Terabyte -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8291) Modify NN WebUI to display correct unit
[ https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongyi Xie updated HDFS-8291: -- Status: Open (was: Patch Available) Modify NN WebUI to display correct unit Key: HDFS-8291 URL: https://issues.apache.org/jira/browse/HDFS-8291 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhongyi Xie Assignee: Zhongyi Xie Priority: Minor NN Web UI displays its capacity and usage in TB, but it is actually TiB. We should either change the unit name or the calculation to ensure it follows standards. http://en.wikipedia.org/wiki/Tebibyte http://en.wikipedia.org/wiki/Terabyte -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Attachment: HDFS-7678-HDFS-7285.003.patch Attaching new patch which based on Andrew's comments. # An overall timeout is enforced # All data fetching happens in a single loop, leveraging Yi's idea under HDFS-7348 # It also refactors shared striped reading logic (among client and DN) to the util class. [~andrew.wang] / [~hitliuyi] could you take a look at the changes in {{StripedBlockUtil}}? If that part looks OK I'll split it to HDFS-8282 and get it in first, so this client decode JIRA doesn't block HDFS-7348. Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678-HDFS-7285.003.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520458#comment-14520458 ] Zhe Zhang commented on HDFS-8272: - The rest of the patch LGTM, +1 and thanks Jing for the contribution! I just committed it to the branch. Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read) - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)