[jira] [Commented] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428 ] Yi Liu commented on HDFS-8033: -- Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504468#comment-14504468 ] Kai Zheng commented on HDFS-8201: - I'm not sure. This work would rather end with a unit test, focusing stripping writing and reading. I thought HDFS-8197 is good for system integration tests. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504485#comment-14504485 ] Yi Liu commented on HDFS-8033: -- BTW, I find we also need to handle {{seek}}, zerocopy read for {{DFSStripedInputStream}}, I filed HDFS-8203 to handle it. Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8200) Refactor FSDirStatAndListingOp
[ https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504487#comment-14504487 ] Hadoop QA commented on HDFS-8200: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726772/HDFS-8200.000.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestLeaseRecovery2 Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10327//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10327//console This message is automatically generated. Refactor FSDirStatAndListingOp -- Key: HDFS-8200 URL: https://issues.apache.org/jira/browse/HDFS-8200 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8200.000.patch After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This jira proposes to clean them up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428 ] Yi Liu edited comment on HDFS-8033 at 4/21/15 6:25 AM: --- Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? was (Author: hitliuyi): Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
Yi Liu created HDFS-8203: Summary: Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504476#comment-14504476 ] Kai Sasaki commented on HDFS-8201: -- [~drankye] I see. If the purpose of this JIRA is like what you mentioned, please keep it. Thank you for clarifying! Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8191) Fix byte to integer casting in SimulatedFSDataset#simulatedByte
[ https://issues.apache.org/jira/browse/HDFS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8191: Attachment: HDFS-8191.001.patch Thanks Andrew for the review! Yes a unit test is a good idea. It turns out I need to refactor {{TestSimulatedFSDataset}} quite a bit to inject simulated books with negative block IDs. But I think the added {{negativeBlkID}} will be useful in the future as well. Both Jenkins failures pass locally. Fix byte to integer casting in SimulatedFSDataset#simulatedByte --- Key: HDFS-8191 URL: https://issues.apache.org/jira/browse/HDFS-8191 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-8191.000.patch, HDFS-8191.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504404#comment-14504404 ] J.Andreina commented on HDFS-7993: -- Thanks [~mingma] and [~vinayrpet] for reviewing and correcting me. I have Updated the patch addressing all the comments. Please review . Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7993: - Attachment: HDFS-7993.6.patch Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504436#comment-14504436 ] Vinayakumar B commented on HDFS-7993: - Thanks [~andreina] for the latest patch. +1. Waiting for jenkins Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8182) Implement topology-aware CDN-style caching
[ https://issues.apache.org/jira/browse/HDFS-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1450#comment-1450 ] Gera Shegalov commented on HDFS-8182: - Hi Andrew, I think the said block placement policy works fine for data whose usage we know a priori such as binaries in YARN-1492 Shared Cache (few terabytes in our case), MR/Spark staging directories, etc. For such cases we/frameworks already set a high replication factor. And the solution with rf=#racks is already good enough. Except for the replication speed vs YARN scheduling race, which would be eliminated with the approach proposed in this JIRA. In some cases we have no a priori knowledge. The most prominent ones are some primary or temporary files are used as the build input of a hash join in an ad-hoc manner. Having a solution that works transparently irrespective of specified replication factor is a win. Another drawback of a block-placement based solution (besides currently being global, not per file) is that it's not elastic, and is oblivious of the data temperature. I think this JIRA would cover both families of cases above well. Implement topology-aware CDN-style caching -- Key: HDFS-8182 URL: https://issues.apache.org/jira/browse/HDFS-8182 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.6.0 Reporter: Gera Shegalov To scale reads of hot blocks in large clusters, it would be beneficial if we could read a block across the ToR switches only once. Example scenarios are localization of binaries, MR distributed cache files for map-side joins and similar. There are multiple layers where this could be implemented (YARN service or individual apps such as MR) but I believe it is best done in HDFS or even common FileSystem to support as many use cases as possible. The life cycle could look like this e.g. for the YARN localization scenario: 1. inputStream = fs.open(path, ..., CACHE_IN_RACK) 2. instead of reading from a remote DN directly, NN tells the client to read via the local DN1 and the DN1 creates a replica of each block. When the next localizer on DN2 in the same rack starts it will learn from NN about the replica in DN1 and the client will read from DN1 using the conventional path. When the application ends the AM or NM's can instruct the NN in a fadvise DONTNEED style, it can start telling DN's to discard extraneous replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428 ] Yi Liu edited comment on HDFS-8033 at 4/21/15 6:33 AM: --- Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ) and it only contains part of the expected data. This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? was (Author: hitliuyi): Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504451#comment-14504451 ] Kai Sasaki commented on HDFS-8201: -- [~drankye] I wonder this JIRA might be duplicate to [HDFS-8197|https://issues.apache.org/jira/browse/HDFS-8197]. Can I file this JIRA under HDFS-8197? Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505467#comment-14505467 ] Tsz Wo Nicholas Sze commented on HDFS-7687: --- For #1, see if you want to create a JIRA for trunk to do some refactoring first. For #2, you may include the test here or in a separated JIRA. Both are fine. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8133) Improve readability of deleted block check
[ https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505512#comment-14505512 ] Hudson commented on HDFS-8133: -- FAILURE: Integrated in Hadoop-trunk-Commit #7626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7626/]) HDFS-8133. Improve readability of deleted block check (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 997408eaaceef20b053ee7344468e28cb9a1379b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoContiguous.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Improve readability of deleted block check -- Key: HDFS-8133 URL: https://issues.apache.org/jira/browse/HDFS-8133 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-8133.patch The current means of checking if a block is deleted is checking if its block collection is null. A more readable approach is an isDeleted method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time
[ https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505364#comment-14505364 ] Chris Nauroth commented on HDFS-8193: - Thank you for the response. That clarifies it for me. If possible, would you please see if there is a way to make the delay visible through metrics and the web UI? Perhaps you could even just populate the same fields that were added in HDFS-5986 and HDFS-6385. Add the ability to delay replica deletion for a period of time -- Key: HDFS-8193 URL: https://issues.apache.org/jira/browse/HDFS-8193 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Zhe Zhang When doing maintenance on an HDFS cluster, users may be concerned about the possibility of administrative mistakes or software bugs deleting replicas of blocks that cannot easily be restored. It would be handy if HDFS could be made to optionally not delete any replicas for a configurable period of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems
[ https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505432#comment-14505432 ] Hudson commented on HDFS-8163: -- FAILURE: Integrated in Hadoop-trunk-Commit #7624 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7624/]) HDFS-8163. Using monotonicNow for block report scheduling causes test failures on recently restarted systems. (Arpit Agarwal) (arp: rev dfc1c4c303cf15afc6c3361ed9d3238562f73cbd) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java Using monotonicNow for block report scheduling causes test failures on recently restarted systems - Key: HDFS-8163 URL: https://issues.apache.org/jira/browse/HDFS-8163 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, HDFS-8163.03.patch {{BPServiceActor#blockReport}} has the following check: {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = monotonicNow(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } {code} Many tests trigger an immediate block report via {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 0}}. However if the machine was restarted recently then startTime may be less than {{dnConf.blockReportInterval}} and the block report is not sent. {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed since an arbitrary origin. The time should be used only for comparison with other values returned by {{System#nanoTime}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Attachment: HDFS-8078.4.patch HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.4.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time
[ https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505386#comment-14505386 ] Zhe Zhang commented on HDFS-8193: - bq. If possible, would you please see if there is a way to make the delay visible through metrics and the web UI? That's a great point. I believe admins will want to monitor both the delay and number of pending deletions. Will either add in this JIRA or a follow-on. bq. Perhaps you could even just populate the same fields that were added in HDFS-5986 and HDFS-6385. Seems to me these metrics differ for each DN. Maybe we should add them to the DN web UI / metrics? We could sum up the number of pending-deletion replicas and show on NN. But the per-DN delays are hard to summarize. Add the ability to delay replica deletion for a period of time -- Key: HDFS-8193 URL: https://issues.apache.org/jira/browse/HDFS-8193 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Zhe Zhang When doing maintenance on an HDFS cluster, users may be concerned about the possibility of administrative mistakes or software bugs deleting replicas of blocks that cannot easily be restored. It would be handy if HDFS could be made to optionally not delete any replicas for a configurable period of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time
[ https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505449#comment-14505449 ] Zhe Zhang commented on HDFS-8193: - Thanks for the pointers Chris! A mock-up is a very good idea; HDFS-5986 and HDFS-6385 are good examples to follow. Add the ability to delay replica deletion for a period of time -- Key: HDFS-8193 URL: https://issues.apache.org/jira/browse/HDFS-8193 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Zhe Zhang When doing maintenance on an HDFS cluster, users may be concerned about the possibility of administrative mistakes or software bugs deleting replicas of blocks that cannot easily be restored. It would be handy if HDFS could be made to optionally not delete any replicas for a configurable period of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505463#comment-14505463 ] Tsz Wo Nicholas Sze commented on HDFS-7687: --- The items looks good. Just a minor point: A Corrupt EC block group could have = 6 blocks but some of the blocks are corrupted. ... in (6,3)-Reed-Solomon, these groups have more than 9 blocks. (Are there these cases?) Yes, it is possible. E.g. a datanode D0 dies and a EC block in D0 is reconstructed in another datanode D1. Later on, D0 comes back. Then, both D0 and D1 have the same EC block and the block group could have more than 9 blocks. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time
[ https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505336#comment-14505336 ] Zhe Zhang commented on HDFS-8193: - Thanks Chris for bringing up the questions. bq. HDFS-6186 only applies at NameNode startup. Is the new feature something that could be triggered at any time on a running NameNode, such as right before a manual HA failover? Short answer is yes. One can imagine it as a trash for block replicas, fully controlled by the DN hosting them. This should shelter block replicas from most admin mis-operations and NN bugs (more likely than DN bugs given the complexity) for a period of time. To answer the question from [~sureshms] under HDFS-6186: bq. One problem with not deleting the blocks for a deleted file is, how does one restore it? Can we address in this jira pausing deletion after startup and address the suggestion you have made, along with other changes that might be necessary, in another jira. First, NN bugs could cause block replicas to be deleted without deleting the file. Second, it's rather easy to back up NN metadata before performing maintenance, but extremely difficult to back up actual DN data. This JIRA aims to address that deficiency / discrepancy. As future work, we plan to investigate an even more radical retention policy, where block replicas are never deleted before DN is actually running out of space. At that moment, victims are selected among pending-deletion replicas using a smart algorithm, and are overwritten by incoming replicas. We'll file a separate JIRA for that, after this JIRA builds the basic DN-side replica retention machinery. Add the ability to delay replica deletion for a period of time -- Key: HDFS-8193 URL: https://issues.apache.org/jira/browse/HDFS-8193 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Zhe Zhang When doing maintenance on an HDFS cluster, users may be concerned about the possibility of administrative mistakes or software bugs deleting replicas of blocks that cannot easily be restored. It would be handy if HDFS could be made to optionally not delete any replicas for a configurable period of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time
[ https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505402#comment-14505402 ] Chris Nauroth commented on HDFS-8193: - bq. Seems to me these metrics differ for each DN. Ah yes, I missed the point that you were aiming for per-DN granularity. In that case, yes, DN metrics would make sense. You also could potentially take the approach done in HDFS-7604 to publish the counters back to the NN in heartbeats, and that would enable the NameNode to display per-DN stats on the Datanodes tab. It's probably worth doing a quick UI mock-up to check if that really makes sense though. Those tables can get crowded quickly. :-) Thanks again. Add the ability to delay replica deletion for a period of time -- Key: HDFS-8193 URL: https://issues.apache.org/jira/browse/HDFS-8193 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Zhe Zhang When doing maintenance on an HDFS cluster, users may be concerned about the possibility of administrative mistakes or software bugs deleting replicas of blocks that cannot easily be restored. It would be handy if HDFS could be made to optionally not delete any replicas for a configurable period of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505424#comment-14505424 ] Tsz Wo Nicholas Sze commented on HDFS-8204: --- This seems a duplicate of HDFS-8147. Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HDFS-8209) ArrayIndexOutOfBoundsException in MiniDFSCluster.
[ https://issues.apache.org/jira/browse/HDFS-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore moved HADOOP-11856 to HDFS-8209: --- Component/s: (was: test) test Affects Version/s: (was: 2.6.0) 2.6.0 Key: HDFS-8209 (was: HADOOP-11856) Project: Hadoop HDFS (was: Hadoop Common) ArrayIndexOutOfBoundsException in MiniDFSCluster. - Key: HDFS-8209 URL: https://issues.apache.org/jira/browse/HDFS-8209 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore I want to create MiniDFSCluster with 2 datanode and for each datanode I want to set different number of StorageTypes, but in this case I am getting ArrayIndexOutOfBoundsException. My cluster schema is like this. {code} final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .numDataNodes(2) .storageTypes(new StorageType[][] {{ StorageType.DISK, StorageType.ARCHIVE },{ StorageType.DISK } }) .build(); {code} *Exception* : {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hdfs.MiniDFSCluster.makeDataNodeDirs(MiniDFSCluster.java:1218) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1402) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:832) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8200) Refactor FSDirStatAndListingOp
[ https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505416#comment-14505416 ] Jing Zhao commented on HDFS-8200: - The patch looks good to me. One minor is that we can also pass the INodeAttributes into {{createFileStatus(..., needLocation, ...)}} to make the style more consistent. +1 after addressing the comments. Refactor FSDirStatAndListingOp -- Key: HDFS-8200 URL: https://issues.apache.org/jira/browse/HDFS-8200 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8200.000.patch After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This jira proposes to clean them up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8133) Improve readability of deleted block check
[ https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505479#comment-14505479 ] Colin Patrick McCabe commented on HDFS-8133: +1. Thanks, Daryn. Test failures are unrelated. I ran the tests locally and they passed. Improve readability of deleted block check -- Key: HDFS-8133 URL: https://issues.apache.org/jira/browse/HDFS-8133 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-8133.patch The current means of checking if a block is deleted is checking if its block collection is null. A more readable approach is an isDeleted method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems
[ https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505350#comment-14505350 ] Jing Zhao commented on HDFS-8163: - Thanks for working on this, [~arpitagarwal]! The patch looks pretty good to me. The only nits is that the following code can be reformatted: {code} +@VisibleForTesting volatile long nextBlockReportTime = monotonicNow(); +@VisibleForTesting volatile long nextHeartbeatTime = monotonicNow(); +@VisibleForTesting boolean resetBlockReportTime = true; {code} I think you can address this while committing the patch. +1. Using monotonicNow for block report scheduling causes test failures on recently restarted systems - Key: HDFS-8163 URL: https://issues.apache.org/jira/browse/HDFS-8163 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Blocker Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, HDFS-8163.03.patch {{BPServiceActor#blockReport}} has the following check: {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = monotonicNow(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } {code} Many tests trigger an immediate block report via {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 0}}. However if the machine was restarted recently then startTime may be less than {{dnConf.blockReportInterval}} and the block report is not sent. {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed since an arbitrary origin. The time should be used only for comparison with other values returned by {{System#nanoTime}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems
[ https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8163: Resolution: Fixed Fix Version/s: 2.7.1 Target Version/s: (was: 2.7.1) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review Jing. Fixed the formatting and committed to trunk, branch-2 and branch-2.7. Here is the delta: {code:java} -// assigned/read by the actor thread. Thus they should be declared as vol -// to make sure the happens-before consistency. -@VisibleForTesting volatile long nextBlockReportTime = monotonicNow(); -@VisibleForTesting volatile long nextHeartbeatTime = monotonicNow(); -@VisibleForTesting boolean resetBlockReportTime = true; +// assigned/read by the actor thread. +@VisibleForTesting +volatile long nextBlockReportTime = monotonicNow(); + +@VisibleForTesting +volatile long nextHeartbeatTime = monotonicNow(); + +@VisibleForTesting +boolean resetBlockReportTime = true; {code} Using monotonicNow for block report scheduling causes test failures on recently restarted systems - Key: HDFS-8163 URL: https://issues.apache.org/jira/browse/HDFS-8163 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, HDFS-8163.03.patch {{BPServiceActor#blockReport}} has the following check: {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = monotonicNow(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } {code} Many tests trigger an immediate block report via {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 0}}. However if the machine was restarted recently then startTime may be less than {{dnConf.blockReportInterval}} and the block report is not sent. {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed since an arbitrary origin. The time should be used only for comparison with other values returned by {{System#nanoTime}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Attachment: (was: HDFS-8078.4.patch) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8133) Improve readability of deleted block check
[ https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8133: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Improve readability of deleted block check -- Key: HDFS-8133 URL: https://issues.apache.org/jira/browse/HDFS-8133 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 2.8.0 Attachments: HDFS-8133.patch The current means of checking if a block is deleted is checking if its block collection is null. A more readable approach is an isDeleted method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504550#comment-14504550 ] Jitendra Nath Pandey commented on HDFS-7240: Thanks for the feedback and comments. I will try to answer the questions over my next few comments. I will also update the document to reflect the discussion here. The stated limits in the document are more of the design goals, and parameters we have in mind while designing for the first phase of the project. These are not hard limits and most of these will be configurable. First I will state a few technical limits and then describe some back of the envelope calculations and heuristics I have used behind these numbers. The technical limitations are following. # The memory in the storage container manager limits the number of storage containers. From the namenode experience, I believe we can go up to a few 100 million storage containers. In later phases of the project we can have a federated architecture with multiple storage container managers for further scale up. # The size of a storage container is limited by how quick we want to replicate the containers when a datanode goes down. The advantage of using a large container size is that it reduces the metadata needed to track container locations which is proportional to number of containers. However, a very large container will reduce the parallelization that cluster can achieve to replicate when a node fails. The container size will be configurable. A default size of 10G seems like a good choice, which is much larger than hdfs block sizes, but still allows hundreds of containers on datanodes with a few terabytes of disk. The maximum size of an object is stated as 5G. In future we would like to even increase this limit when we can support multi-part writes similar to S3. However, it is expected that average size of the objects would be much smaller. The most common range is expected to be a few hundred KBs to a few hundred MBs. Assuming 100 million containers, 1MB average size of an object, and 10G the storage container size, it amounts to 10 Trillion objects. I think 10 trillion is a lofty goal to have : ). The division of 10 trillion into 10 million buckets with a million object in each bucket is kind of arbitrary, but we believed users will prefer smaller buckets for better organization. We will keep these configurable. The storage volume settings provide admins a control over the usage of the storage. In a private cloud, a cluster shared by lots of tenants can have a storage volume dedicated to each tenant. A tenant can be a user or a project or a group of users. Therefore, a limit of 1000 buckets implying around 1PB of storage per tenant seems reasonable. But, I do agree that when we have a quota on a storage volume size, an additional limit on number of buckets is not really needed. We plan to carry out the project in several phases. I would like to propose following phases: Phase 1 # Basic API as covered in the document. # Storage container machinery, reliability, replication. Phase 2 # High availability # Security # Secondary index for object listing with prefixes. Phase 3 # Caching to improve latency. # Further scalability in terms of number of objects and object sizes. # Cross-geo replication. I have created branch HDFS-7240 for this work. We will start filing jiras and posting patches. Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Ozone-architecture-v1.pdf This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8136) Client gets and uses EC schema when reads and writes a stripping file
[ https://issues.apache.org/jira/browse/HDFS-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504624#comment-14504624 ] Li Bo commented on HDFS-8136: - The patch also looks good to me. When I apply the patch to branch 7285, it shows {{Reversed (or previously applied) patch detected}}. Your changes to {{TestDFSStripedOutputStream}} seems has been committed to branch by other patch. Could you update your patch according to current branch code? Client gets and uses EC schema when reads and writes a stripping file - Key: HDFS-8136 URL: https://issues.apache.org/jira/browse/HDFS-8136 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Zheng Assignee: Kai Sasaki Attachments: HDFS-8136.1.patch, HDFS-8136.2.patch, HDFS-8136.3.patch Discussed with [~umamaheswararao] and [~vinayrpet], in client when reading and writing a stripping file, it can invoke a separate call to NameNode to request the EC schema associated with the EC zone where the file is in. Then the schema can be used to guide the reading and writing. Currently it uses hard-coded values. Optionally, as an optimization consideration, client may cache schema info per file or per zone or per schema name. We could add schema name in {{HdfsFileStatus}} for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8160) Long delays when calling hdfsOpenFile()
[ https://issues.apache.org/jira/browse/HDFS-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504623#comment-14504623 ] Steve Loughran commented on HDFS-8160: -- it ultimately worked as after timing out, the DFS client tried a different host. what may be happening is that the datanodes are reporting in as healthy, but the address they publish for clients to get that data isn't accessible. Wrong hostname or firewalls being the common causes; network routing problems another try a telnet to the hostname port listed, from the machine that isn't able to connect, and see what happens Long delays when calling hdfsOpenFile() --- Key: HDFS-8160 URL: https://issues.apache.org/jira/browse/HDFS-8160 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.2 Environment: 3-node Apache Hadoop 2.5.2 cluster running on Ubuntu 14.04 dfshealth overview: Security is off. Safemode is off. 8 files and directories, 9 blocks = 17 total filesystem object(s). Heap Memory used 45.78 MB of 90.5 MB Heap Memory. Max Heap Memory is 889 MB. Non Heap Memory used 36.3 MB of 70.44 MB Commited Non Heap Memory. Max Non Heap Memory is 130 MB. Configured Capacity: 118.02 GB DFS Used: 2.77 GB Non DFS Used: 12.19 GB DFS Remaining:103.06 GB DFS Used%:2.35% DFS Remaining%: 87.32% Block Pool Used: 2.77 GB Block Pool Used%: 2.35% DataNodes usages% (Min/Median/Max/stdDev):2.35% / 2.35% / 2.35% / 0.00% Live Nodes3 (Decommissioned: 0) Dead Nodes0 (Decommissioned: 0) Decommissioning Nodes 0 Number of Under-Replicated Blocks 0 Number of Blocks Pending Deletion 0 Datanode Information In operation Node Last contactAdmin State CapacityUsedNon DFS Used Remaining Blocks Block pool used Failed Volumes Version hadoop252-3 (x.x.x.10:50010) 1 In Service 39.34 GB944.85 MB 3.63 GB 34.79 GB9 944.85 MB (2.35%) 0 2.5.2 hadoop252-1 (x.x.x.8:50010) 0 In Service 39.34 GB944.85 MB 4.94 GB 33.48 GB9 944.85 MB (2.35%) 0 2.5.2 hadoop252-2 (x.x.x.9:50010) 1 In Service 39.34 GB944.85 MB 3.63 GB 34.79 GB9 944.85 MB (2.35%) 0 2.5.2 java version 1.7.0_76 Java(TM) SE Runtime Environment (build 1.7.0_76-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode) Reporter: Rod Calling hdfsOpenFile on a file residing on target 3-node Hadoop cluster (described in detail in Environment section) blocks for a long time (several minutes). I've noticed that the delay is related to the size of the target file. For example, attempting to hdfsOpenFile() on a file of filesize 852483361 took 121 seconds, but a file of 15458 took less than a second. Also, during the long delay, the following stacktrace is routed to standard out: 2015-04-16 10:32:13,943 WARN [main] hdfs.BlockReaderFactory (BlockReaderFactory.java:getRemoteBlockReaderFromTcp(693)) - I/O error constructing remote block reader. org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143) 2015-04-16 10:32:13,946 WARN [main] hdfs.DFSClient (DFSInputStream.java:blockSeekTo(612)) - Failed to connect to /10.40.8.10:50010 for block, add to deadNodes and continue. org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) at
[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start
[ https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504811#comment-14504811 ] Hudson commented on HDFS-8179: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/]) HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system start. (Contributed by Xiaoyu Yao) (arp: rev c92f6f360515cc21ecb9b9f49b3e59537ef0cb05) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java DFSClient#getServerDefaults returns null within 1 hour of system start -- Key: HDFS-8179 URL: https://issues.apache.org/jira/browse/HDFS-8179 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch We recently hit NPE during Ambari Oozie service check. The failed hdfs command is below. It repros sometimes and then go away after the cluster runs for a while. {code} [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r /user/ambari-qa/mapredsmokeoutput rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} With additional tracing, the failure was located to the following stack. {code} 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration java.lang.NullPointerException at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86) at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117) at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205) at org.apache.hadoop.fs.shell.Command.run(Command.java:166) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504810#comment-14504810 ] Hudson commented on HDFS-7916: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev ed4137cebf27717e9c79eae515b0b83ab6676465) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504812#comment-14504812 ] Hudson commented on HDFS-7993: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/]) HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) (vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8205) fs -count -q -t -v -h displays wrong information
Peter Shi created HDFS-8205: --- Summary: fs -count -q -t -v -h displays wrong information Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Peter Shi Priority: Minor {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504633#comment-14504633 ] Hadoop QA commented on HDFS-7281: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726794/HDFS-7281-4.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10329//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10329//console This message is automatically generated. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504914#comment-14504914 ] Takanobu Asanuma commented on HDFS-7687: Sorry for my late work, [~szetszwo]. I'm mainly changing codes in {{NamenodeFsck.check}} to handle EC and I'm going to add some metrics for EC, referring to replication. Please would you check these metrics? {{Total EC block groups}}: The number of all EC block groups on the HDFS. {{Minimally stored block groups}}: The number of EC block groups which have enough blocks to recover. For example, in (6,3)-Reed-Solomon, these groups have 6 blocks at least. {{Over EC block groups}}: The number of EC block groups which have excess blocks for some reason. For example, in (6,3)-Reed-Solomon, these groups have more than 9 blocks. (Are there these cases?) {{Under EC block groups}}: The number of EC block groups which have lost blocks. {{Mis EC block groups}}: The number of EC block groups whose rack locations are invalid. {{Default EC schema}}: This is usually SYS-DEFAULT-RS-6-3. I think this will be set by a configuration file later. {{Corrupt EC block groups}}: The number of EC block groups which don't have enough blocks to recovery. For example, in (6,3)-Reed-Solomon, these groups have less than 6 blocks, so they can't recover. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504916#comment-14504916 ] Takanobu Asanuma commented on HDFS-7687: And I have other thoughts. Should I create other tickets about the things below? # {{Namenodefsck.check}} is a large method. If I add the codes to handle EC in this methods, it will become larger and more complicated. So we will refactor it later. # We should add some tests about fsck for EC. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Component/s: balancer mover Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: HDFS-8204.001.patch Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
Walter Su created HDFS-8204: --- Summary: Balancer: 2 replicas ends in same node after running balance. Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Reporter: Walter Su Assignee: Walter Su -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Description: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. was: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7993: Summary: Provide each Replica details in fsck (was: Incorrect descriptions in fsck when nodes are decommissioned) Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8205) fs -count -q -t -v -h displays wrong information
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Shi updated HDFS-8205: Assignee: Peter Shi Status: Patch Available (was: Open) fs -count -q -t -v -h displays wrong information Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Minor Attachments: HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8154) Extract WebHDFS protocol out as a specification to allow easier clients and servers
[ https://issues.apache.org/jira/browse/HDFS-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504618#comment-14504618 ] Steve Loughran commented on HDFS-8154: -- ..no opinions; I think this would make a good experiment to see which worked best integrated with both the development and build processes. Anything where the build could at least verify the specification was well formed consistent would be nice Extract WebHDFS protocol out as a specification to allow easier clients and servers --- Key: HDFS-8154 URL: https://issues.apache.org/jira/browse/HDFS-8154 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan WebHDFS would be more useful if there were a programmatic description of its interface, which would allow one to more easily create servers and clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) fs -count -q -t -v -h displays wrong information
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504685#comment-14504685 ] Hadoop QA commented on HDFS-8205: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726825/HDFS-8205.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common: org.apache.hadoop.crypto.key.TestValueQueue Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10332//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10332//console This message is automatically generated. fs -count -q -t -v -h displays wrong information Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Minor Attachments: HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504713#comment-14504713 ] Hudson commented on HDFS-7993: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7623 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7623/]) HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) (vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Description: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8176) Provide information about the snapshots compared in audit log
[ https://issues.apache.org/jira/browse/HDFS-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504533#comment-14504533 ] Hadoop QA commented on HDFS-8176: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726157/HDFS-8176.1.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10328//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10328//console This message is automatically generated. Provide information about the snapshots compared in audit log - Key: HDFS-8176 URL: https://issues.apache.org/jira/browse/HDFS-8176 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-8176.1.patch Provide information about the snapshots compared in audit log. In current code value null is been passed. {code} logAuditEvent(diffs != null, computeSnapshotDiff, null, null, null); {code} {noformat} 2015-04-15 09:56:49,328 INFO FSNamesystem.audit: allowed=true ugi=Rex (auth:SIMPLE) ip=/Xcmd=computeSnapshotDiff src=null dst=nullperm=null proto=rpc {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) fs -count -q -t -v -h displays wrong information
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504570#comment-14504570 ] Peter Shi commented on HDFS-8205: - This bug is introduced by HDFS-7701, i will attach patch to fix it. fs -count -q -t -v -h displays wrong information Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Peter Shi Priority: Minor {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504576#comment-14504576 ] Jitendra Nath Pandey commented on HDFS-7240: [~steve_l], thanks for the review. bq. is there a limit on the #of storage volumes in a cluster? does GET/ return all of them? Please see my discussion on limits above. The storage volumes are created by admins, therefore, are not expected to be too many. bq. any way to enum users? e.g. GET /admin/user/ ? We don't plan to manage users in ozone. In this respect we deviate from popular public object stores. This is because, in a private cluster deployment the user management is usually tied to corporate user accounts. Instead we choose storage volume abstraction for certain administrative settings like quota. However, admins can choose to allocate a storage volume for each user. bq. what if I want to GET the 1001st entry in an object store? Not sure I understand the use case. Do you mean the users would like to query using some sort of entry number or index? bq. GET on object must support ranges Agree, we plan to take up this feature in the 2nd phase. bq. HEAD should supply content-length This should be easily doable. We will keep it in mind for container implementation. Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Ozone-architecture-v1.pdf This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8136) Client gets and uses EC schema when reads and writes a stripping file
[ https://issues.apache.org/jira/browse/HDFS-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-8136: - Attachment: HDFS-8136.4.patch Client gets and uses EC schema when reads and writes a stripping file - Key: HDFS-8136 URL: https://issues.apache.org/jira/browse/HDFS-8136 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Zheng Assignee: Kai Sasaki Attachments: HDFS-8136.1.patch, HDFS-8136.2.patch, HDFS-8136.3.patch, HDFS-8136.4.patch Discussed with [~umamaheswararao] and [~vinayrpet], in client when reading and writing a stripping file, it can invoke a separate call to NameNode to request the EC schema associated with the EC zone where the file is in. Then the schema can be used to guide the reading and writing. Currently it uses hard-coded values. Optionally, as an optimization consideration, client may cache schema info per file or per zone or per schema name. We could add schema name in {{HdfsFileStatus}} for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8205) fs -count -q -t -v -h displays wrong information
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Shi updated HDFS-8205: Attachment: HDFS-8205.patch fs -count -q -t -v -h displays wrong information Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Peter Shi Priority: Minor Attachments: HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7993: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed trunk and branch-2. Thanks [~andreina] for the great contribution. Thanks [~mingma] and [~cmccabe] for great suggestions and reviews. Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7621) Erasure Coding: update the Balancer/Mover data migration logic
[ https://issues.apache.org/jira/browse/HDFS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504588#comment-14504588 ] Walter Su commented on HDFS-7621: - I'm still reading the code, and thinking how to do it. By the way, I found a bug in balancer. HDFS-8204 Erasure Coding: update the Balancer/Mover data migration logic -- Key: HDFS-7621 URL: https://issues.apache.org/jira/browse/HDFS-7621 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Walter Su Currently the Balancer/Mover only considers the distribution of replicas of the same block during data migration: the migration cannot decrease the number of racks. With EC the Balancer and Mover should also take into account the distribution of blocks belonging to the same block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8191) Fix byte to integer casting in SimulatedFSDataset#simulatedByte
[ https://issues.apache.org/jira/browse/HDFS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504672#comment-14504672 ] Hadoop QA commented on HDFS-8191: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726801/HDFS-8191.001.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10330//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10330//console This message is automatically generated. Fix byte to integer casting in SimulatedFSDataset#simulatedByte --- Key: HDFS-8191 URL: https://issues.apache.org/jira/browse/HDFS-8191 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-8191.000.patch, HDFS-8191.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504674#comment-14504674 ] Hadoop QA commented on HDFS-7993: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726802/HDFS-7993.6.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10331//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10331//console This message is automatically generated. Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504691#comment-14504691 ] Jitendra Nath Pandey commented on HDFS-7240: [~clamb], thanks for a detailed review and feedback. Some of the answers are below, for others I will post the updated document with details as you have pointed out. bq. Is the 1KB key size limit a hard limit or just a design/implementation target It is a design target. Amazon's S3 limits the keys to 1KB. I doubt there would be many use cases that need beyond it. I see the point that instead of hard limit allow for degradation. But at this point in the project, I would prefer to have more strict limits, and relax later instead of setting user expectations too high to begin with. bq. Caching to reduce network traffic I agree that a good caching layer will significantly help the performance. Ozone handler seems like a natural place for caching. However, a thick client can do its own caching without overloading datanodes. The focus of phase 1 is to get the semantics right and lay down the basic architecture in place. We plan to attack performance improvements in a later phase of the project. bq. Security mechanisms Frankly, I haven't thought about anything other than kerberos. I agree, we should evaluate it against what other popular object stores use. bq. Hot spots in hash partitioning. It is possible for a pathological sequence of keys, but in practice hash partitioning has been successfully used to avoid hot spots e.g. hash-partitioned indexes in databases. We would need to pick hash functions with nice distribution properties. bq. Secondary indexing consistency The secondary index need not be strictly consistent with the bucket. That means a listing operation with prefix or key range may not reflect the latest of the bucket. We will have a more concrete proposal in the second phase of the project. bq. Storage volume GET for admin I believed that it is not a security concern in allowing users to see all storage volume names. However, it is possible to conceive a use case where an admin would want to restrict that. Probably we can support both the modes. bq. no guarantees on partially written objects The object will not be visible until completely written. Also, no recovery is planned for the first phase if the write fails. In future, we would like to support multi-part uploads. bq. Re-using block management implementation for container management. We intend to reuse the DatanodeProtocol that datanode uses to talk to namenode. I will add more details to the document and on the corresponding jira. bq. storage container prototype using leveldbjni We will add lot more details on this in its own jira. The idea is to use leveldbjni in the storage container in the datanodes. We plan to prototype a storage container that stores objects as individual files within the container however, that would need an index within the container to map a key to a file. We will use leveldbjni for that index. Another possible prototype is to put the entire object in the leveldbjni itself. It will take some experimentation to zero-down to the right approach. We will also try to make the storage container implementation pluggable to make it easy to try different implementations. bq. How are quotas enabled and set? who enforces them All the Ozone APIs are implemented in ozone handler. The quota will also be enforced by the ozone handler. I will update the document with the APIs. Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Ozone-architecture-v1.pdf This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start
[ https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505075#comment-14505075 ] Hudson commented on HDFS-8179: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/]) HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system start. (Contributed by Xiaoyu Yao) (arp: rev c92f6f360515cc21ecb9b9f49b3e59537ef0cb05) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java DFSClient#getServerDefaults returns null within 1 hour of system start -- Key: HDFS-8179 URL: https://issues.apache.org/jira/browse/HDFS-8179 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch We recently hit NPE during Ambari Oozie service check. The failed hdfs command is below. It repros sometimes and then go away after the cluster runs for a while. {code} [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r /user/ambari-qa/mapredsmokeoutput rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} With additional tracing, the failure was located to the following stack. {code} 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration java.lang.NullPointerException at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86) at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117) at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205) at org.apache.hadoop.fs.shell.Command.run(Command.java:166) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505074#comment-14505074 ] Hudson commented on HDFS-7916: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev ed4137cebf27717e9c79eae515b0b83ab6676465) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505076#comment-14505076 ] Hudson commented on HDFS-7993: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/]) HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) (vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504745#comment-14504745 ] Hudson commented on HDFS-7993: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/]) HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) (vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504743#comment-14504743 ] Hudson commented on HDFS-7916: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev ed4137cebf27717e9c79eae515b0b83ab6676465) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start
[ https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504744#comment-14504744 ] Hudson commented on HDFS-8179: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/]) HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system start. (Contributed by Xiaoyu Yao) (arp: rev c92f6f360515cc21ecb9b9f49b3e59537ef0cb05) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java DFSClient#getServerDefaults returns null within 1 hour of system start -- Key: HDFS-8179 URL: https://issues.apache.org/jira/browse/HDFS-8179 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch We recently hit NPE during Ambari Oozie service check. The failed hdfs command is below. It repros sometimes and then go away after the cluster runs for a while. {code} [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r /user/ambari-qa/mapredsmokeoutput rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} With additional tracing, the failure was located to the following stack. {code} 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration java.lang.NullPointerException at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86) at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117) at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205) at org.apache.hadoop.fs.shell.Command.run(Command.java:166) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505128#comment-14505128 ] Hudson commented on HDFS-7993: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/]) HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) (vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java Provide each Replica details in fsck Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start
[ https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505127#comment-14505127 ] Hudson commented on HDFS-8179: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/]) HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system start. (Contributed by Xiaoyu Yao) (arp: rev c92f6f360515cc21ecb9b9f49b3e59537ef0cb05) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java DFSClient#getServerDefaults returns null within 1 hour of system start -- Key: HDFS-8179 URL: https://issues.apache.org/jira/browse/HDFS-8179 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch We recently hit NPE during Ambari Oozie service check. The failed hdfs command is below. It repros sometimes and then go away after the cluster runs for a while. {code} [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r /user/ambari-qa/mapredsmokeoutput rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} With additional tracing, the failure was located to the following stack. {code} 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration java.lang.NullPointerException at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86) at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117) at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205) at org.apache.hadoop.fs.shell.Command.run(Command.java:166) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) rm: Failed to get server trash configuration: null. Consider using -skipTrash option {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505126#comment-14505126 ] Hudson commented on HDFS-7916: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/]) HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev ed4137cebf27717e9c79eae515b0b83ab6676465) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8200) Refactor FSDirStatAndListingOp
[ https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8200: - Attachment: HDFS-8200.001.patch Refactor FSDirStatAndListingOp -- Key: HDFS-8200 URL: https://issues.apache.org/jira/browse/HDFS-8200 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8200.000.patch, HDFS-8200.001.patch After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This jira proposes to clean them up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505797#comment-14505797 ] Colin Patrick McCabe commented on HDFS-8213: Hi Billie, {{DFSClient}} needs to instantiate {{SpanReceiverHost}} in order to implement tracing, in the case where the process using the {{DFSClient}} doesn't configure its own span receivers. If you are concerned about multiple span receivers being instantiated, simply set {{hadoop.htrace.span.receiver.classes}} to the empty string, and Hadoop won't instantiate any span receivers. That should be its default anyway. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505854#comment-14505854 ] Billie Rinaldi commented on HDFS-8213: -- If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8156: Attachment: HDFS-8156-v7.patch How about this one, all your comments addressed. Add/implement necessary APIs even we just have the system default schema Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, HDFS-8156-v6.patch, HDFS-8156-v7.patch According to the discussion here, this issue was repurposed and modified. This is to add and implement some necessary APIs even we just have the system default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505896#comment-14505896 ] Tsz Wo Nicholas Sze commented on HDFS-7687: --- Yes, refactoring in trunk first. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505663#comment-14505663 ] Tsz Wo Nicholas Sze commented on HDFS-8156: --- - Please do not add extractChunkSize(). Similar to initWith(..), it is unnecessary. - The other fields numDataUnits, numParityUnits and chunkSize should also be final. - The javadoc is still incorrect. Add/implement necessary APIs even we just have the system default schema Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, HDFS-8156-v6.patch According to the discussion here, this issue was repurposed and modified. This is to add and implement some necessary APIs even we just have the system default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages
[ https://issues.apache.org/jira/browse/HDFS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505748#comment-14505748 ] Chris Nauroth commented on HDFS-8194: - There could be some potential overlap here with the work done in HDFS-7604, although that feature specifically reported only on volumes/storages that had failed, not all volumes. Add administrative tool to be able to examine the NN's view of DN storages -- Key: HDFS-8194 URL: https://issues.apache.org/jira/browse/HDFS-8194 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe The NN has long had facilities to be able to list all of the DNs that are registered with it. It would be great if there were an administrative tool be able to list all of the individual storages that the NN is tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7005) DFS input streams do not timeout
[ https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505636#comment-14505636 ] Nick Dimiduk commented on HDFS-7005: Any chance of bringing this to a 2.5.x patch release? Over on HBASE-13339 we're trying to work out how best to support users with minimal impact on dependencies for our next minor release (1.1). Bumping Hadoop minor versions (I think) will break our semantic versioning compatibility guidelines. FYI [~eclark], [~busbey], [~cnauroth] DFS input streams do not timeout Key: HDFS-7005 URL: https://issues.apache.org/jira/browse/HDFS-7005 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.5.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7005.patch Input streams lost their timeout. The problem appears to be {{DFSClient#newConnectedPeer}} does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever. The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7005) DFS input streams do not timeout
[ https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505722#comment-14505722 ] Chris Nauroth commented on HDFS-7005: - Hi [~ndimiduk]. I'm not aware of any plans for a 2.5.3 patch release. To do so, we'd need someone to volunteer as release manager and conduct a vote on a release candidate. [~kasha], I'm notifying you just FYI, since you had been release manager previously on the 2.5.x release line. DFS input streams do not timeout Key: HDFS-7005 URL: https://issues.apache.org/jira/browse/HDFS-7005 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.5.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7005.patch Input streams lost their timeout. The problem appears to be {{DFSClient#newConnectedPeer}} does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever. The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
Billie Rinaldi created HDFS-8213: Summary: DFSClient should not instantiate SpanReceiverHost Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Status: Open (was: Patch Available) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.4.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505877#comment-14505877 ] Takanobu Asanuma commented on HDFS-7687: Thanks for your review, Nicholas! bq. A Corrupt EC block group could have = 6 blocks but some of the blocks are corrupted. bq. Yes, it is possible. E.g. a datanode D0 dies and a EC block in D0 is reconstructed in another datanode D1. Later on, D0 comes back. Then, both D0 and D1 have the same EC block and the block group could have more than 9 blocks. OK, I understand. bq. For #1, see if you want to create a JIRA for trunk to do some refactoring first. You mean, if it needs to do some refactoring, we should do refactoring in trunk branch first before we add the logic to handle EC? Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505847#comment-14505847 ] Billie Rinaldi commented on HDFS-8213: -- As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505860#comment-14505860 ] Nick Dimiduk commented on HDFS-8213: I think [~billie.rinaldi] is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505925#comment-14505925 ] Takanobu Asanuma commented on HDFS-7687: I understand. Thank you. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8185) Separate client related routines in HAUtil into a new class
[ https://issues.apache.org/jira/browse/HDFS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505867#comment-14505867 ] Tsz Wo Nicholas Sze commented on HDFS-8185: --- - Both DFSUtilClient.locatedBlocks2Locations methods are current not used. Please move them later. - DFS_NAMENODE_HTTP_ADDRESS_KEY, DFS_NAMENODE_HTTP_ADDRESS_KEY, etc are namenode confs. -* In DFSConfigKeys, set DFS_NAMENODE_HTTP_ADDRESS_KEY = HdfsClientConfigKeys.DFS_NAMENODE_HTTP_PORT_DEFAULT but not deprecate them -* Namenode, datanode, etc. should keep using the DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY -* Client will uses HdfsClientConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY. Separate client related routines in HAUtil into a new class --- Key: HDFS-8185 URL: https://issues.apache.org/jira/browse/HDFS-8185 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8185.000.patch, HDFS-8185.001.patch This jira proposes to move the routines used by the client implementation in HAUtil to a separate class and to move them into the hdfs-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505897#comment-14505897 ] Tsz Wo Nicholas Sze commented on HDFS-7687: --- Yes, refactoring in trunk first. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7005) DFS input streams do not timeout
[ https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505739#comment-14505739 ] Karthik Kambatla commented on HDFS-7005: Thanks for the ping, [~cnauroth]. [~ndimiduk] - there are no active plans for 2.5.3. If HDFS committers think this issue is serious enough to warrant a point release, I don't mind creating the RC and putting it through a vote. DFS input streams do not timeout Key: HDFS-7005 URL: https://issues.apache.org/jira/browse/HDFS-7005 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.5.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7005.patch Input streams lost their timeout. The problem appears to be {{DFSClient#newConnectedPeer}} does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever. The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8118) Delay in checkpointing Trash can leave trash for 2 intervals before deleting
[ https://issues.apache.org/jira/browse/HDFS-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505791#comment-14505791 ] Harsh J commented on HDFS-8118: --- Thanks for explaining that Casey. It makes sense to constant-ise the checkpoint date for uniformity - and the fix for this looks alright to me. It also may make sense that people want to set checkpoint intervals equal to the trash intervals. I think we can remove the change in the patch of capping it to 1/2 the value of intervals, but just add a small doc note in hdfs-default.xml to the trash checkpoint period property on what the behaviour could end up being if its set to equal of the trash clearing interval. Would it also be possible to come up with a test-case for this? For example, load some files into trash such that multiple dirs need to be checkpointed, and issue a checkpoint (or await its lowered interval) and ensure only one date is observed before clearing occurs? It would help avoid regressions in future, just in case. Delay in checkpointing Trash can leave trash for 2 intervals before deleting Key: HDFS-8118 URL: https://issues.apache.org/jira/browse/HDFS-8118 Project: Hadoop HDFS Issue Type: Bug Reporter: Casey Brotherton Assignee: Casey Brotherton Priority: Trivial Attachments: HDFS-8118.patch When the fs.trash.checkpoint.interval and the fs.trash.interval are set non-zero and the same, it is possible for trash to be left for two intervals. The TrashPolicyDefault will use a floor and ceiling function to ensure that the Trash will be checkpointed every interval of minutes. Each user's trash is checkpointed individually. The time resolution of the checkpoint timestamp is to the second. If the seconds switch while one user is checkpointing, then the next user's timestamp will be later. This will cause the next user's checkpoint to not be deleted at the next interval. I have recreated this in a lab cluster I also have a suggestion for a patch that I can upload later tonight after testing it further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505884#comment-14505884 ] Hadoop QA commented on HDFS-8078: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726941/HDFS-8078.4.patch against trunk revision 424a00d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-client. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10334//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10334//console This message is automatically generated. HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.4.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter
[ https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505968#comment-14505968 ] Anu Engineer commented on HDFS-8211: [~aw] would you like to take a look at this to see if this is related to the new changes to build ? DataNode UUID is always null in the JMX counter --- Key: HDFS-8211 URL: https://issues.apache.org/jira/browse/HDFS-8211 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Priority: Minor The DataNode JMX counters are tagged with DataNode UUID, but it always gets a null value instead of the UUID. {code} Hadoop:service=DataNode,name=FSDatasetState*-null*. {code} This null is supposed be the datanode UUID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8211) DataNode UUID is always null in the JMX counter
[ https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8211: --- Status: Patch Available (was: Open) DataNode UUID is always null in the JMX counter --- Key: HDFS-8211 URL: https://issues.apache.org/jira/browse/HDFS-8211 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Priority: Minor Attachments: hdfs-8211.001.patch The DataNode JMX counters are tagged with DataNode UUID, but it always gets a null value instead of the UUID. {code} Hadoop:service=DataNode,name=FSDatasetState*-null*. {code} This null is supposed be the datanode UUID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter
[ https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505982#comment-14505982 ] Anu Engineer commented on HDFS-8211: Also verified that FSDatasetState-UUID appears correctly using jconsole. DataNode UUID is always null in the JMX counter --- Key: HDFS-8211 URL: https://issues.apache.org/jira/browse/HDFS-8211 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Priority: Minor Attachments: hdfs-8211.001.patch The DataNode JMX counters are tagged with DataNode UUID, but it always gets a null value instead of the UUID. {code} Hadoop:service=DataNode,name=FSDatasetState*-null*. {code} This null is supposed be the datanode UUID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter
[ https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506017#comment-14506017 ] Brahma Reddy Battula commented on HDFS-8211: Nice catch, Patch,LGTM +1 ( non binding) DataNode UUID is always null in the JMX counter --- Key: HDFS-8211 URL: https://issues.apache.org/jira/browse/HDFS-8211 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Priority: Minor Attachments: hdfs-8211.001.patch The DataNode JMX counters are tagged with DataNode UUID, but it always gets a null value instead of the UUID. {code} Hadoop:service=DataNode,name=FSDatasetState*-null*. {code} This null is supposed be the datanode UUID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8185) Separate client related routines in HAUtil into a new class
[ https://issues.apache.org/jira/browse/HDFS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8185: -- Component/s: hdfs-client Hadoop Flags: Reviewed +1 the new patch looks good. Thanks, Haohui. Separate client related routines in HAUtil into a new class --- Key: HDFS-8185 URL: https://issues.apache.org/jira/browse/HDFS-8185 Project: Hadoop HDFS Issue Type: Sub-task Components: build, hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8185.000.patch, HDFS-8185.001.patch, HDFS-8185.002.patch This jira proposes to move the routines used by the client implementation in HAUtil to a separate class and to move them into the hdfs-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8216) TestDFSStripedOutputStream should use BlockReaderTestUtil to create BlockReader
[ https://issues.apache.org/jira/browse/HDFS-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8216: -- Attachment: h8216_20150421.patch h8216_20150421.patch: use BlockReaderTestUtil.getBlockReader. TestDFSStripedOutputStream should use BlockReaderTestUtil to create BlockReader --- Key: HDFS-8216 URL: https://issues.apache.org/jira/browse/HDFS-8216 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8216_20150421.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su resolved HDFS-8204. - Resolution: Duplicate Balancer: 2 replicas ends in same node after running balance. - Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8156: Attachment: HDFS-8156-v8.patch Thanks for the review and comments. bq.Should the DEFAULT_CODEC_NAME be RS? Currently we have no chance to use the codec name yet. In the codec framework we have {{RSErasureCodec}} for the code, and the codec name RS would help us locate it in all the maintained codecs map. Updated the patch adding extractIntOption method to handle the repeated logic and error handling together. A more review? Thanks! Add/implement necessary APIs even we just have the system default schema Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, HDFS-8156-v6.patch, HDFS-8156-v7.patch, HDFS-8156-v8.patch According to the discussion here, this issue was repurposed and modified. This is to add and implement some necessary APIs even we just have the system default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8147) Mover should not select the DN storage as target where already same replica exists.
[ https://issues.apache.org/jira/browse/HDFS-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8147: Attachment: HDFS-8147_2.patch Mover should not select the DN storage as target where already same replica exists. --- Key: HDFS-8147 URL: https://issues.apache.org/jira/browse/HDFS-8147 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Attachments: HDFS-8147.patch, HDFS-8147_1.patch, HDFS-8147_2.patch *Scenario:* 1. Three DN cluster. For DNs storage type is like this. DN1 : DISK,ARCHIVE DN2 : DISK DN3 : DISK,ARCHIVE (All DNs are in same rack) 2. One file with two replicas (In DN1 and DN2) 3. Set file storage policy COLD 4. Now execute Mover. *Expected Result:* File blocks should move in DN1:ARCHIVE and DN3:ARCHIVE *Actual Result:* {{chooseTargetInSameNode()}} move D1:DISK block to D1:ARCHIVE, but in next iteration {{chooseTarget()}} for same rake is selecting again DN1:ARCHIVE for target where already same block exists. {{chooseTargetInSameNode()}} and {{chooseTarget()}} should not select the node as target where already same replica exists. *Logs* {code} 15/04/15 10:47:17 WARN balancer.Dispatcher: Failed to move blk_1073741852_1028 with size=11990 from 10.19.92.74:50010:DISK to 10.19.92.73:50010:ARCHIVE through 10.19.92.73:50010: Got error, status message opReplaceBlock BP-1258709199-10.19.92.74-1428292615636:blk_1073741852_1028 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Replica FinalizedReplica, blk_1073741852_1028, FINALIZED {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)