[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8838: Attachment: HDFS-8838-HDFS-7285-20150809-test.patch A test patch HDFS-8838-HDFS-7285-20150809-test.patch to trigger Jenkins to test if HDFS-8896 works. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-8838-HDFS-7285-000.patch, HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696880#comment-14696880 ] Vinayakumar B commented on HDFS-7285: - I have tried to rebase current {{HDFS-7285}} branch against the current trunk using {{git rebase trunk}}. It was not smooth as expected. Since I did not wanted to push the rebase directly onto {{HDFS-7285}}, created one more branch {{HDFS-7285-REBASE}}. This branch is just for reference purpose. The advantage of this is, it retained all the commits along with message,date and author details, even after resolving conflicts. I skipped one commit purposefully HDFS-8787 to be in sync with trunk. it was just rename of files. other than this, no other commits got squashed. There were 192 commits to be rebased against trunk, including the intermediate merge conflict resolved commits. Since I couldnt edit each and every commit to resolve compilation errors after each commit, resolved remaining compilation errors at the end, with one more commit. If anyone wants to verify, please checkout the branch HDFS-7285-REBASE. and can compare against the Consolidated patch. Since this is only for trying to check the possibility of rebase, I am not saying this should be considered as final branch. If everyone feels good to go like this approach, I could do some more detailed rebase next week, ( may be verify the compilation after each commit ? Not sure whether its possible to stop for each commit rebase) -Thanks Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: Consolidated-20150707.patch, Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, HDFS-7285-merge-consolidated-trunk-01.patch, HDFS-7285-merge-consolidated.trunk.03.patch, HDFS-7285-merge-consolidated.trunk.04.patch, HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696547#comment-14696547 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8298/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696548#comment-14696548 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8298/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696558#comment-14696558 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-trunk-Commit #8299 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8299/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696524#comment-14696524 ] Vinayakumar B commented on HDFS-7213: - Cherry-picked to 2.6.1. processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Fix Version/s: (was: 2.7.0) 2.6.1 processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Fix Version/s: 2.7.0 processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7235: Labels: (was: 2.6.1-candidate) DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7235: Fix Version/s: 2.6.1 DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696535#comment-14696535 ] Vinayakumar B commented on HDFS-7235: - Cherry-picked to 2.6.1 DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7263: Labels: (was: 2.6.1-candidate) Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696550#comment-14696550 ] Vinayakumar B commented on HDFS-7263: - Cherry-picked to 2.6.1 Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Labels: 2.6.1-candidate Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7263: Fix Version/s: 2.6.1 Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Labels: 2.6.1-candidate Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7225: Labels: (was: 2.6.1-candidate) Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7225: Fix Version/s: 2.6.1 Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696620#comment-14696620 ] Vinayakumar B commented on HDFS-7225: - Cherry-picked to 2.6.1 Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696624#comment-14696624 ] Hudson commented on HDFS-7225: -- FAILURE: Integrated in Hadoop-trunk-Commit #8302 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8302/]) HDFS-7225. Remove stale block invalidation work when DN re-registers with different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 08bd4edf4092901273da0d73a5cc760fdc11052b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696642#comment-14696642 ] Hadoop QA commented on HDFS-8891: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 42s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 177m 6s | Tests failed in hadoop-hdfs. | | | | 219m 23s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation | | Timed out tests | org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer | | | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750423/HDFS-8891.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0a03054 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11994/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11994/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11994/console | This message was automatically generated. HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696639#comment-14696639 ] Yi Liu commented on HDFS-8859: -- The two test failures are not related. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7213: Labels: (was: 2.6.1-candidate) processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696635#comment-14696635 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 45s | The applied patch generated 6 new checkstyle issues (total was 12, now 16). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 22s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 173m 14s | Tests failed in hadoop-hdfs. | | | | 240m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.net.TestNetUtils | | | hadoop.ha.TestZKFailoverController | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0a03054 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/console | This message was automatically generated. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697031#comment-14697031 ] Hudson commented on HDFS-8270: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-8270. create() always retried with hardcoded timeout when file already exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 84bf71295a5e52b2a7bb69440a885a25bc75f544) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt create() always retried with hardcoded timeout when file already exists with open lease --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina Fix For: 2.6.1, 2.7.1 Attachments: HDFS-8270-branch-2.6-v3.patch, HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, HDFS-8270.3.patch In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697033#comment-14697033 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697034#comment-14697034 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697036#comment-14697036 ] Hudson commented on HDFS-7225: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-7225. Remove stale block invalidation work when DN re-registers with different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 08bd4edf4092901273da0d73a5cc760fdc11052b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697037#comment-14697037 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697057#comment-14697057 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697059#comment-14697059 ] Hudson commented on HDFS-7225: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-7225. Remove stale block invalidation work when DN re-registers with different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 08bd4edf4092901273da0d73a5cc760fdc11052b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697065#comment-14697065 ] Hudson commented on HDFS-7649: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. (Contributed by Brahma Reddy Battula) (arp: rev ae57d60d8239916312bca7149e2285b2ed3b123a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697042#comment-14697042 ] Hudson commented on HDFS-7649: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. (Contributed by Brahma Reddy Battula) (arp: rev ae57d60d8239916312bca7149e2285b2ed3b123a) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697060#comment-14697060 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697054#comment-14697054 ] Hudson commented on HDFS-8270: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-8270. create() always retried with hardcoded timeout when file already exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 84bf71295a5e52b2a7bb69440a885a25bc75f544) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt create() always retried with hardcoded timeout when file already exists with open lease --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina Fix For: 2.6.1, 2.7.1 Attachments: HDFS-8270-branch-2.6-v3.patch, HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, HDFS-8270.3.patch In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697056#comment-14697056 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull
[ https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696757#comment-14696757 ] Felix Borchers commented on HDFS-8093: -- I have a very similar problem while running the balancer. {{hdfs fsck /}} returned HEALTHY and the block, causing the balancer to throw an exception is not in the HDFS anymore. {{hdfs fsck / -files -blocks | grep blk_1074256920_516292}} - returned nothing Digging in the logs of the DataNode shows, that the block was deleted on the node. (see below for log file excerpt) Digging in the logs of the NameNode shows, something like block does not belong to any file (see below for log file excerpt) It seems, there is a problem with removed/deleted blocks ?! DataNode Logs = only lines with: blk_1074256920_516292 displayed: {code} 2015-08-14 00:30:03,893 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292 src: /10.13.53.16:37605 dest: /10.13.53.19:50010 2015-08-14 00:30:07,841 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074256920_516292 file /data/is24/hadoop/1/dfs/dataNode/current/BP-322804774-10.13.54.1-1412684451669/current/rbw/blk_1074256920 for deletion 2015-08-14 00:30:09,092 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-322804774-10.13.54.1-1412684451669 blk_1074256920_516292 file /data/is24/hadoop/1/dfs/dataNode/current/BP-322804774-10.13.54.1-1412684451669/current/rbw/blk_1074256920 org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292 2015-08-14 00:46:44,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292, type=LAST_IN_PIPELINE, downstreams=0:[] org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292 2015-08-14 00:46:44,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {code} NameNode Logs = only lines with: blk_1074256920_516292 displayed: {code} 2015-08-14 00:30:03,843 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 blk_1074256920_516292{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-4db312aa-bc23-47dc-b768-52a2d72b09d3:NORMAL:10.13.53.30:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c7db1b58-8e25-435f-8af8-08b6754c021c:NORMAL:10.13.53.16:50010|RBW], ReplicaUnderConstruction[[DISK]DS-4457ae11-7684-4187-b4ad-56466d79fba2:NORMAL:10.13.53.19:50010|RBW]]} 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* addBlock: c blk_1074256920_516292 on node 10.13.53.16:50010 size 134217728 does not belong to any file 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1074256920_516292 to 10.13.53.16:50010 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* BlockManager: ask 10.13.53.16:50010 to delete [blk_1074256920_516292] 2015-08-14 00:30:04,840 INFO BlockStateChange: BLOCK* addBlock: block blk_1074256920_516292 on node 10.13.53.19:50010 size 134217728 does not belong to any file 2015-08-14 00:30:04,840 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1074256920_516292 to 10.13.53.19:50010 2015-08-14 00:30:05,925 INFO BlockStateChange: BLOCK* addBlock: block blk_1074256920_516292 on node 10.13.53.30:50010 size 134217728 does not belong to any file 2015-08-14 00:30:05,925 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1074256920_516292 to 10.13.53.30:50010 2015-08-14 00:30:07,000 INFO BlockStateChange: BLOCK* BlockManager: ask 10.13.53.19:50010 to delete [blk_1074208004_467362, blk_1074224392_483753, blk_1074093070_352362, blk_1074240530_499900, blk_1074256920_516292, blk_1074224154_483515, blk_1074240554_499924, blk_1074240556_499926, blk_1074240561_499931, blk_1074224178_483539, blk_1074240563_499933, blk_1074207795_467153, blk_1074093108_352429, blk_1074207797_467155, blk_1073798197_57374, blk_1074224182_483543, blk_1074240569_499939, blk_1074207802_467160, blk_1074224187_483548, blk_1074224188_483549, blk_1074207805_467163, blk_1074158653_418001, blk_1074207806_467164, blk_1074224191_483552, blk_1074207809_467167, blk_1074207817_467175, blk_1074207818_467176, blk_1074207820_467178, blk_1074207822_467180, blk_1074207830_467188, blk_1074224216_483577, blk_1074224217_483578, blk_1073798237_57414, blk_1073929310_188502, blk_1074207843_467201, blk_1073847400_106577, blk_1074207852_467210,
[jira] [Commented] (HDFS-7116) Add a command to get the bandwidth of balancer
[ https://issues.apache.org/jira/browse/HDFS-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696698#comment-14696698 ] Rakesh R commented on HDFS-7116: Hi All, In the proposed patch, Datanode is sending {{balancerBandwidth}} value to the Namenode through heartbeats. As we know this is done for consistency discussed earlier in this jira. On a second look, I have another idea which will have less overhead. - We already have a set of datanode metrics exposed which can be used by admins/monitoring tools. How about exposing {{balancerBandwidth}} value as a Datanode metric? Here, admin/monitoring tool has to individually collect the metrics from every Datanode. Add a command to get the bandwidth of balancer -- Key: HDFS-7116 URL: https://issues.apache.org/jira/browse/HDFS-7116 Project: Hadoop HDFS Issue Type: New Feature Components: balancer mover Reporter: Akira AJISAKA Assignee: Rakesh R Attachments: HDFS-7116-00.patch, HDFS-7116-01.patch Now reading logs is the only way to check how the balancer bandwidth is set. It would be useful for administrators if they can get the parameter via CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-8850: -- Labels: (was: 2.6.1-candidate) Removing the 2.6.1-candidate label, as this is an issue for a class that exists only in 2.7 and as such I don't think it applies to 2.6. Let me know if you think this issue exists in 2.6. VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks - Key: HDFS-8850 URL: https://issues.apache.org/jira/browse/HDFS-8850 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-8850.001.patch The VolumeScanner threads inside the BlockScanner exit with an exception if there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696691#comment-14696691 ] Yong Zhang commented on HDFS-8891: -- Failed test cases are not related to this patch. HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7116) Add a command to get the bandwidth of balancer
[ https://issues.apache.org/jira/browse/HDFS-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696692#comment-14696692 ] Rakesh R commented on HDFS-7116: Hi All, Add a command to get the bandwidth of balancer -- Key: HDFS-7116 URL: https://issues.apache.org/jira/browse/HDFS-7116 Project: Hadoop HDFS Issue Type: New Feature Components: balancer mover Reporter: Akira AJISAKA Assignee: Rakesh R Attachments: HDFS-7116-00.patch, HDFS-7116-01.patch Now reading logs is the only way to check how the balancer bandwidth is set. It would be useful for administrators if they can get the parameter via CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697117#comment-14697117 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697113#comment-14697113 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697116#comment-14697116 ] Hudson commented on HDFS-7225: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-7225. Remove stale block invalidation work when DN re-registers with different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 08bd4edf4092901273da0d73a5cc760fdc11052b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697114#comment-14697114 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697122#comment-14697122 ] Hudson commented on HDFS-7649: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. (Contributed by Brahma Reddy Battula) (arp: rev ae57d60d8239916312bca7149e2285b2ed3b123a) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697111#comment-14697111 ] Hudson commented on HDFS-8270: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) HDFS-8270. create() always retried with hardcoded timeout when file already exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 84bf71295a5e52b2a7bb69440a885a25bc75f544) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt create() always retried with hardcoded timeout when file already exists with open lease --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina Fix For: 2.6.1, 2.7.1 Attachments: HDFS-8270-branch-2.6-v3.patch, HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, HDFS-8270.3.patch In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697138#comment-14697138 ] Hudson commented on HDFS-7225: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-7225. Remove stale block invalidation work when DN re-registers with different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 08bd4edf4092901273da0d73a5cc760fdc11052b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove stale block invalidation work when DN re-registers with different UUID - Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697133#comment-14697133 ] Hudson commented on HDFS-8270: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-8270. create() always retried with hardcoded timeout when file already exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 84bf71295a5e52b2a7bb69440a885a25bc75f544) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt create() always retried with hardcoded timeout when file already exists with open lease --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina Fix For: 2.6.1, 2.7.1 Attachments: HDFS-8270-branch-2.6-v3.patch, HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, HDFS-8270.3.patch In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697135#comment-14697135 ] Hudson commented on HDFS-7235: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode#transferBlock should report blocks that don't exist using reportBadBlock - Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation
[ https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697136#comment-14697136 ] Hudson commented on HDFS-7213: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-7213. processIncrementalBlockReport performance degradation. Contributed by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt processIncrementalBlockReport performance degradation - Key: HDFS-7213 URL: https://issues.apache.org/jira/browse/HDFS-7213 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Daryn Sharp Assignee: Eric Payne Priority: Critical Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt {{BlockManager#processIncrementalBlockReport}} has a debug line that is missing a {{isDebugEnabled}} check. The write lock is being held. Coupled with the increase in incremental block reports from receiving blocks, under heavy load this log line noticeably degrades performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697139#comment-14697139 ] Hudson commented on HDFS-7263: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-7263. Snapshot read can reveal future bytes for appended files. Contributed by Tao Luo. (vinayakumarb: rev fa2641143c0d74c4fef122d79f27791e15d3b43f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot read can reveal future bytes for appended files. - Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 2.7.0, 2.6.1 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697144#comment-14697144 ] Hudson commented on HDFS-7649: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. (Contributed by Brahma Reddy Battula) (arp: rev ae57d60d8239916312bca7149e2285b2ed3b123a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8897) Loadbalancer
LINTE created HDFS-8897: --- Summary: Loadbalancer Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LINTE updated HDFS-8897: Summary: Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... (was: Loadbalancer ) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697154#comment-14697154 ] Hadoop QA commented on HDFS-8896: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 12s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 25s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 174m 49s | Tests failed in hadoop-hdfs. | | | | 242m 23s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750491/HDFS-8896.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 84bf712 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11995/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11995/console | This message was automatically generated. DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager --- Key: HDFS-8896 URL: https://issues.apache.org/jira/browse/HDFS-8896 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC root. screenshot_1 shows how DN object be traced to the GC root. It's not a problem in production. It's a problem in test, especially when MiniDFSCluster starts/shutdowns many DNs, which could cause {{OutOfMemoryError}}. screenshot_2 shows many DN objects are not GCed when run the test of HDFS-8838. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697240#comment-14697240 ] Hadoop QA commented on HDFS-8838: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 56s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 20s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 20s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 21m 55s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 204m 28s | Tests failed in hadoop-hdfs. | | | | 268m 52s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.net.TestNetUtils | | | hadoop.ha.TestZKFailoverController | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.TestWriteStripedFileWithFailure | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750492/HDFS-8838-HDFS-7285-20150809-test.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1d37a88 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11996/console | This message was automatically generated. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-8838-HDFS-7285-000.patch, HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697316#comment-14697316 ] Walter Su commented on HDFS-8896: - The failed tests failed before the patch([link|https://builds.apache.org/job/PreCommit-HDFS-Build/11989/testReport/]). So it's not related. DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager --- Key: HDFS-8896 URL: https://issues.apache.org/jira/browse/HDFS-8896 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC root. screenshot_1 shows how DN object be traced to the GC root. It's not a problem in production. It's a problem in test, especially when MiniDFSCluster starts/shutdowns many DNs, which could cause {{OutOfMemoryError}}. screenshot_2 shows many DN objects are not GCed when run the test of HDFS-8838. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697551#comment-14697551 ] Rakesh R commented on HDFS-8220: I've rebased previous patch on {{HDFS-7285-merge}} branch and attached the same here. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts
[ https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697547#comment-14697547 ] Jason Lowe commented on HDFS-8898: -- This would solve a significant annoyance with computing quotas on a shared tree. However I think it has security implications. If one can get the quota totals for the entire tree then they can calculate what must be used by the parts they cannot access via quota_usage - usage_visible. If what is being stored in the restricted area is sensitive (e.g.: records related to financials) then knowing how many files or the size of the restricted data could leak sensitive information. Create API and command-line argument to get quota without need to get file and directory counts --- Key: HDFS-8898 URL: https://issues.apache.org/jira/browse/HDFS-8898 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Joep Rottinghuis On large directory structures it takes significant time to iterate through the file and directory counts recursively to get a complete ContentSummary. When you want to just check for the quota on a higher level directory it would be good to have an option to skip the file and directory counts. Moreover, currently one can only check the quota if you have access to all the directories underneath. For example, if I have a large home directory under /user/joep and I host some files for another user in a sub-directory, the moment they create an unreadable sub-directory under my home I can no longer check what my quota is. Understood that I cannot check the current file counts unless I can iterate through all the usage, but for administrative purposes it is nice to be able to get the current quota setting on a directory without the need to iterate through and run into permission issues on sub-directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - Decomissioning
[ https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697646#comment-14697646 ] Hudson commented on HDFS-8565: -- FAILURE: Integrated in Hadoop-trunk-Commit #8307 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8307/]) HDFS-8565. Typo in dfshealth.html - Decomissioning. (nijel via xyao) (xyao: rev 1569228ec9090823186f062257fdf1beb5ee1781) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html Typo in dfshealth.html - Decomissioning - Key: HDFS-8565 URL: https://issues.apache.org/jira/browse/HDFS-8565 Project: Hadoop HDFS Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Attachments: HDFS-8565.patch div class=page-headerh1smallDecomissioning/small/h1/div change to div class=page-headerh1smallDecommissioning/small/h1/div in dfshealth.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697564#comment-14697564 ] Hadoop QA commented on HDFS-6244: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 24s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 1m 37s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750579/HDFS-6244.v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 84bf712 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11999/console | This message was automatically generated. Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697693#comment-14697693 ] Arpit Agarwal commented on HDFS-7649: - Thanks for catching and taking care of this Nicholas. Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Status: Open (was: Patch Available) Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Status: Patch Available (was: Open) Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Attachment: HDFS-6244.v5.patch Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Attachment: (was: HDFS-6244.v5.patch) Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - Decomissioning
[ https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8565: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~nijel] for the contribution. The patch has been committed to trunk and branch-2. Typo in dfshealth.html - Decomissioning - Key: HDFS-8565 URL: https://issues.apache.org/jira/browse/HDFS-8565 Project: Hadoop HDFS Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Attachments: HDFS-8565.patch div class=page-headerh1smallDecomissioning/small/h1/div change to div class=page-headerh1smallDecommissioning/small/h1/div in dfshealth.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697669#comment-14697669 ] Tsz Wo Nicholas Sze commented on HDFS-7649: --- Merged this to branch-2. Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations
[ https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697668#comment-14697668 ] Tsz Wo Nicholas Sze commented on HDFS-7649: --- It seems that this was only committed to trunk but not yet merged to branch-2. Multihoming docs should emphasize using hostnames in configurations --- Key: HDFS-7649 URL: https://issues.apache.org/jira/browse/HDFS-7649 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Arpit Agarwal Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7649.patch The docs should emphasize that master and slave configurations should hostnames wherever possible. Link to current docs: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - Decomissioning
[ https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8565: - Fix Version/s: 2.8.0 Typo in dfshealth.html - Decomissioning - Key: HDFS-8565 URL: https://issues.apache.org/jira/browse/HDFS-8565 Project: Hadoop HDFS Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Fix For: 2.8.0 Attachments: HDFS-8565.patch div class=page-headerh1smallDecomissioning/small/h1/div change to div class=page-headerh1smallDecommissioning/small/h1/div in dfshealth.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697529#comment-14697529 ] Hadoop QA commented on HDFS-6244: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750575/HDFS-6244.v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 84bf712 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11998/console | This message was automatically generated. Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: HDFS-8220-HDFS-7285-merge-10.patch Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697681#comment-14697681 ] Ravi Prakash commented on HDFS-8344: Hi Haohui! There are arguments on both sides (time based vs count based). e.g. I may take down the cluster and bring it back up after enough time to expire the timeout in which case we wouldn't have retried enough times. Please let me know if you feel strongly though, and I can add one more configuration for the timeout (in addition to the number of retries). It feels like we are over-designing now. This is a rare enough event (client dies, and before the lease expiration so do the nodes it wrote to). NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8824) Do not use small blocks for balancing the cluster
[ https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697679#comment-14697679 ] Hudson commented on HDFS-8824: -- FAILURE: Integrated in Hadoop-trunk-Commit #8308 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8308/]) HDFS-8824. Do not use small blocks for balancing the cluster. (szetszwo: rev 2bc0a4f299fbd8035e29f62ce9cd22e209a62805) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Do not use small blocks for balancing the cluster - Key: HDFS-8824 URL: https://issues.apache.org/jira/browse/HDFS-8824 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8824_20150727b.patch, h8824_20150811b.patch Balancer gets datanode block lists from NN and then move the blocks in order to balance the cluster. It should not use the blocks with small size since moving the small blocks generates a lot of overhead and the small blocks do not help balancing the cluster much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Attachment: HDFS-6244.v5.patch Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - Decomissioning
[ https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697592#comment-14697592 ] Xiaoyu Yao commented on HDFS-8565: -- +1. Patch LGTM. I will commit it shortly. Typo in dfshealth.html - Decomissioning - Key: HDFS-8565 URL: https://issues.apache.org/jira/browse/HDFS-8565 Project: Hadoop HDFS Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Attachments: HDFS-8565.patch div class=page-headerh1smallDecomissioning/small/h1/div change to div class=page-headerh1smallDecommissioning/small/h1/div in dfshealth.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8899) Erasure Coding: use threadpool for EC recovery tasks
Rakesh R created HDFS-8899: -- Summary: Erasure Coding: use threadpool for EC recovery tasks Key: HDFS-8899 URL: https://issues.apache.org/jira/browse/HDFS-8899 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea is to use threadpool for processing erasure coding recovery tasks at the datanode. {code} new Daemon(new ReconstructAndTransferBlock(recoveryInfo)).start(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697324#comment-14697324 ] Rakesh R commented on HDFS-8220: Any more comments on the attached patch. Hi [~zhz], I hope {{HDFS-7285-merge}} is the active branch, should I create another patch now? Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697397#comment-14697397 ] Hadoop QA commented on HDFS-6955: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 22s | The applied patch generated 3 new checkstyle issues (total was 154, now 155). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 55s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 146m 6s | Tests failed in hadoop-hdfs. | | | | 191m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir | | | org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750505/HDFS-6955-01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 84bf712 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11997/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11997/console | This message was automatically generated. DN should reserve disk space for a full block when creating tmp files - Key: HDFS-6955 URL: https://issues.apache.org/jira/browse/HDFS-6955 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: kanaka kumar avvaru Attachments: HDFS-6955-01.patch HDFS-6898 is introducing disk space reservation for RBW files to avoid running out of disk space midway through block creation. This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697330#comment-14697330 ] Walter Su commented on HDFS-8838: - failed tests not related. +1 for the last patch. (20150809.patch) Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-8838-HDFS-7285-000.patch, HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697377#comment-14697377 ] Zhe Zhang commented on HDFS-8220: - [~rakeshr] Yes it'd be great if you can create a patch for {{HDFS-7285-merge}}. I don't think there will be much conflict since this change is on client. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts
Joep Rottinghuis created HDFS-8898: -- Summary: Create API and command-line argument to get quota without need to get file and directory counts Key: HDFS-8898 URL: https://issues.apache.org/jira/browse/HDFS-8898 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Joep Rottinghuis On large directory structures it takes significant time to iterate through the file and directory counts recursively to get a complete ContentSummary. When you want to just check for the quota on a higher level directory it would be good to have an option to skip the file and directory counts. Moreover, currently one can only check the quota if you have access to all the directories underneath. For example, if I have a large home directory under /user/joep and I host some files for another user in a sub-directory, the moment they create an unreadable sub-directory under my home I can no longer check what my quota is. Understood that I cannot check the current file counts unless I can iterate through all the usage, but for administrative purposes it is nice to be able to get the current quota setting on a directory without the need to iterate through and run into permission issues on sub-directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull
[ https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697395#comment-14697395 ] Tsz Wo Nicholas Sze commented on HDFS-8093: --- The file /system/balancer.id seems to be deleted. Could you grep /system/balancer.id from the NN log? Also, are there other log messages between 2015-08-14 00:30:03,843 and 2015-08-14 00:30:04,000? BP does not exist or is not under Constructionnull -- Key: HDFS-8093 URL: https://issues.apache.org/jira/browse/HDFS-8093 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.6.0 Environment: Centos 6.5 Reporter: LINTE HDFS balancer run during several hours blancing blocs beetween datanode, it ended by failing with the following error. getStoredBlock function return a null BlockInfo. java.io.IOException: Bad response ERROR for block BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from datanode 192.168.0.18:1004 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897) 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 192.168.0.18:1004 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not exist or is not under Constructionnull at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548) 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755 org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not exist or is not under Constructionnull at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980) at
[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697787#comment-14697787 ] Jing Zhao commented on HDFS-8801: - Actually converting BlockInfoUnderConstruction can bring us some benefits. Currently when processing block reports, if a finalized replica is reported, we may replace the corresponding UC blockInfo object with a newly created complete blockInfo object inside of the INodeFile. This replacement mixes the states of the block storage management and the NameSystem management, and forces the block report processing to take the Namesystem write lock. To convert BlockInfoUC as a feature can avoid the BlockInfo object replacement. It helps separating the storage level and file system level, and allows us to do further block report processing improvement (e.g., separating the lock for namesystem and blockmanager). Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8891: Issue Type: Bug (was: Improvement) HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Bug Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697814#comment-14697814 ] Jing Zhao commented on HDFS-8891: - +1. I will commit the patch shortly. HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697842#comment-14697842 ] Hudson commented on HDFS-8891: -- FAILURE: Integrated in Hadoop-trunk-Commit #8309 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8309/]) HDFS-8891. HDFS concat should keep srcs order. Contributed by Yong Zhang. (jing9: rev dc7a061668a3f4d86fe1b07a40d46774b5386938) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHDFSConcat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Bug Reporter: Yong Zhang Assignee: Yong Zhang Fix For: 2.8.0 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697844#comment-14697844 ] Hadoop QA commented on HDFS-8833: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750606/HDFS-8833-HDFS-7285-merge.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1d37a88 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12002/console | This message was automatically generated. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697981#comment-14697981 ] Haohui Mai commented on HDFS-8344: -- bq. Even if its simpler, there's a chance that recovery is never attempted, and that is not acceptable IMHO. Can you explain how the NN never try to recover the lease? All leases are periodically checked in {{LeaseManager#checkLease()}}, where the recovery happens. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8801: Attachment: HDFS-8801.000.patch Initial patch to demo the idea. Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Attachments: HDFS-8801.000.patch Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697725#comment-14697725 ] Ming Ma commented on HDFS-7446: --- For the 2.6.1 effort, the backport is straightforward. But the API has changed compared to 2.6.0. This incompatibility only impacts folks who have been using inotify functionality introduced in 2.6.0. HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, HDFS-7446.003.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8801: Affects Version/s: (was: 2.7.1) Status: Patch Available (was: Open) Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8801.000.patch Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697817#comment-14697817 ] Ravi Prakash commented on HDFS-8344: bq. If you take down the cluster and bring it back up. All writing pipeline will fail and should fail. That is correct. This JIRA is for the case that data loss has already occurred. i.e. client died + the DNs to which it wrote already died. We are trying to recover the lease in this JIRA. My argument was that after client+DNs have died, if I only have a timeout, I could take down the cluster. When I bring the cluster back up after the timeout value, the lease will be recovered without trying all the DNs. bq. This is internal implementation details and I'm very reluctant to make it configurable Perhaps I should have said internal hard-coded configuration? Similar to {{recoveryAttemptsBeforeMarkingBlockMissing}} of version 8 of the patch. bq. Having only one concept for detecting failures (i.e., time out) is simpler than two (i.e., time out and number of retries). Even if its simpler, there's a chance that recovery is never attempted, and that is not acceptable IMHO. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order
[ https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8891: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks Yong for the contribution! HDFS concat should keep srcs order -- Key: HDFS-8891 URL: https://issues.apache.org/jira/browse/HDFS-8891 Project: Hadoop HDFS Issue Type: Bug Reporter: Yong Zhang Assignee: Yong Zhang Fix For: 2.8.0 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch FSDirConcatOp.verifySrcFiles may change src files order, but it should their order as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone
[ https://issues.apache.org/jira/browse/HDFS-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697873#comment-14697873 ] Zhe Zhang commented on HDFS-8853: - Thanks [~andreina] for the patch. Do you mind rebasing it? I was also thinking about the issue when creating HDFS-8833 patch. In the long term, it might be better for the client to pass a {{String}} to NN instead of the actual policy/schema. Erasure Coding: Provide ECSchema validation when creating ECZone Key: HDFS-8853 URL: https://issues.apache.org/jira/browse/HDFS-8853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: J.Andreina Attachments: HDFS-8853-HDFS-7285-01.patch Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} doesn't have any validation that the given {{ecSchema}} is available in {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists then will create the ECZone with {{null}} schema. IMHO we could improve this by doing necessary basic sanity checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697904#comment-14697904 ] Zhe Zhang commented on HDFS-8801: - Thanks for initiating the work Jing! The overall structure in the patch looks good to me. Should we take the chance to change {{replicas}} from a List to an array? This can offset some of the memory overhead from the feature pointer, and also help us reconcile trunk with the striped UC code later. Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8801.000.patch Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697943#comment-14697943 ] Walter Su commented on HDFS-8838: - I saw HDFS-8220 just get committed, would you mind rebase this to solve conflicts? Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFS-8838-HDFS-7285-000.patch, HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697956#comment-14697956 ] Zhe Zhang commented on HDFS-7285: - Many thanks to Vinay for the great effort! I just cherry-picked the 2 new commits (HDFS-8854 and HDFS-8220) to the branch. Also created a Jenkins [job | https://builds.apache.org/job/Hadoop-HDFS-7285-nightly/]. I'll also compare {{HDFS-7285-REBASE}} with the consolidated patch. After that and verifying Jenkins results, I'll push it as {{HDFS-7285}} so we can better proceed with pending subtasks. I'll also move the current {{HDFS-7285}} branch as a backup, in case we want to reconcile differences in individual commits. Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: Consolidated-20150707.patch, Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, HDFS-7285-merge-consolidated-trunk-01.patch, HDFS-7285-merge-consolidated.trunk.03.patch, HDFS-7285-merge-consolidated.trunk.04.patch, HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-8801: --- Assignee: Jing Zhao Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8801.000.patch Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8833: Status: Patch Available (was: Open) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8824) Do not use small blocks for balancing the cluster
[ https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8824: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks Jitendra for reviewing the patch. I have committed this. Do not use small blocks for balancing the cluster - Key: HDFS-8824 URL: https://issues.apache.org/jira/browse/HDFS-8824 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Fix For: 2.8.0 Attachments: h8824_20150727b.patch, h8824_20150811b.patch Balancer gets datanode block lists from NN and then move the blocks in order to balance the cluster. It should not use the blocks with small size since moving the small blocks generates a lot of overhead and the small blocks do not help balancing the cluster much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)