[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated HDFS-8828: --- Attachment: HDFS-8828.002.patch Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
[ https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu resolved HDFS-8847. - Resolution: Fixed change TestHDFSContractAppend to not override testRenameFileBeingAppended method. - Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
[ https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reopened HDFS-8847: - change TestHDFSContractAppend to not override testRenameFileBeingAppended method. - Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650156#comment-14650156 ] Ming Ma commented on HDFS-8480: --- Make sense. It is easier to check in an edit log with an older version. Thanks [~zhz] and [~cmccabe]. Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
[ https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650152#comment-14650152 ] zhihai xu commented on HDFS-8847: - The patch from HADOOP-12268 (https://issues.apache.org/jira/secure/attachment/12748104/HADOOP-12268.001.patch) has change at hdfs project, which is in TestHDFSContractAppend.java. I committed the change in TestHDFSContractAppend.java to trunk and branch-2. change TestHDFSContractAppend to not override testRenameFileBeingAppended method. - Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
[ https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu resolved HDFS-8847. - Resolution: Fixed Hadoop Flags: Reviewed change TestHDFSContractAppend to not override testRenameFileBeingAppended method. - Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
[ https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HDFS-8847: Fix Version/s: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. - Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing
[ https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650166#comment-14650166 ] Ming Ma commented on HDFS-8846: --- 2.5 should be fine. Although people still use 2.3 or 2.4, this test tries to verify old edits can be replayed during upgrade; not necessarily compatibilities around edit log formats. If you unpack the existing fsimage tgz files under {{hadoop-hdfs-project/hadoop-hdfs/src/test/resources}}, they have the proper namenode dir contents that include VERSION file, etc.. Just wonder if you are going to create something similar except that the fsimage is empty. Create edit log files with old layout version for upgrade testing - Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650070#comment-14650070 ] Yi Liu edited comment on HDFS-6682 at 8/1/15 12:32 AM: --- Thanks Allen, Andrew and Akira for the discussion. Our original intention is to solve issue which is good, thank you for working on it. About the discussion itself, Andrew's suggestion is good, and another option is to record latest time of {{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have metrics about the {{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}}, so we can know whether/how long the under replica list is handled since last time if we really want to see. My point is not worth to record whole under replicated list for this metric. On the other hand, we have {{UnderReplicatedBlocks}} and {{PendingReplicationBlocks}}, right? Replication monitor thread will periodically pick up some under replicated blocks, unless the NN stops (e.g, full gc), compute replication work will always happen in some CPU time slice, of course it could be slow since there maybe many things need to be handled in NN (e.g. many requests). But actually if NN is slow, we have many ways to know it. About Akira's comment about the metric is also about the entire HDFS cluster, we talk DataNode here, I think more correctly thing it's to record the timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if network is very busy or target DNs corrupted if we want to get the Cluster health from replication blocks' review, {{UnderReplicatedBlocks}} can't stand for that. So if we want to have some metrics about the replicated blocks in NN, let's find some lightweight way as suggested, thanks. was (Author: hitliuyi): Thanks Allen, Andrew and Akira for the discussion. Our original intention is to solve issue which is good, thank you for working on it. About the discussion itself, Andrew's suggestion is good, and another option is to record latest time of {{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have metrics about the {{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}}, so we can know whether/how long the under replica list is handled since last time if we really want to see. My point is not worth to record whole under replicated list for this metric. On way other hand, we have {{UnderReplicatedBlocks}} and {{PendingReplicationBlocks}}, right? Replication monitor thread will periodically pick up some under replicated blocks, unless the NN stops (e.g, full gc), compute replication work will always happen in some CPU time slice, of course it could be slow since there maybe many things need to be handled in NN (e.g. many requests). But actually if NN is slow, we have many ways to know it. About Akira's comment about the metric is also about the entire HDFS cluster, we talk DataNode here, I think more correctly thing it's to record the timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if network is very busy or target DNs corrupted if we want to get the Cluster health from replication blocks' review, {{UnderReplicatedBlocks}} can't stand for that. So if we want to have some metrics about the replicated blocks in NN, let's find some lightweight way as suggested, thanks. Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: metrics Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650080#comment-14650080 ] Hadoop QA commented on HDFS-8828: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 29s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 29s | The applied patch generated 19 new checkstyle issues (total was 120, now 139). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 52s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 45s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 7m 18s | Tests passed in hadoop-distcp. | | | | 48m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747636/HDFS-8828.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d311a38 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/diffcheckstylehadoop-distcp.txt | | hadoop-distcp test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/testrun_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11881/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11881/console | This message was automatically generated. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8155: -- Attachment: HDFS-8155-1.patch First patch for review. We've been testing a version of this code for a few months and it's working well. Two types of OAuth code grants (client credentials and refresh/access tokens provided by the conf) are supported by default and other code grants are user implementable. I had planned on using Apache Oltu for this, but that project doesn't seem very active and it's main benefit - special case support for oauth2 providers like github/twitter/fb, etc. - is of marginal benefit for WebHDFS and could easily be implemented by the user if necessary. I didn't end up using the Authenticator client class because it's too closely tied to the spnego implementation, but after this goes in it will be a good idea to make that class more generic and use it for the oauth stuff as well. Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-8155-1.patch WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650045#comment-14650045 ] Hudson commented on HDFS-6860: -- FAILURE: Integrated in Hadoop-trunk-Commit #8253 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8253/]) HDFS-6860. BlockStateChange logs are too noisy. Contributed by Chang Li and Xiaoyu Yao. (xyao: rev d311a38a6b32bbb210bd8748cfb65463e9c0740e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Fix For: 2.8.0 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650070#comment-14650070 ] Yi Liu commented on HDFS-6682: -- Thanks Allen, Andrew and Akira for the discussion. Our original intention is to solve issue which is good, thank you for working on it. About the discussion itself, Andrew's suggestion is good, and another option is to record latest time of {{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have metrics about the {{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}}, so we can know whether/how long the under replica list is handled since last time if we really want to see. My point is not worth to record whole under replicated list for this metric. On way other hand, we have {{UnderReplicatedBlocks}} and {{PendingReplicationBlocks}}, right? Replication monitor thread will periodically pick up some under replicated blocks, unless the NN stops (e.g, full gc), compute replication work will always happen in some CPU time slice, of course it could be slow since there maybe many things need to be handled in NN (e.g. many requests). But actually if NN is slow, we have many ways to know it. About Akira's comment about the metric is also about the entire HDFS cluster, we talk DataNode here, I think more correctly thing it's to record the timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if network is very busy or target DNs corrupted if we want to get the Cluster health from replication blocks' review, {{UnderReplicatedBlocks}} can't stand for that. So if we want to have some metrics about the replicated blocks in NN, let's find some lightweight way as suggested, thanks. Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: metrics Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650073#comment-14650073 ] Haohui Mai commented on HDFS-8823: -- This jira is focus on replication factor. I suggest opening another jira if you want to discuss moving storage policy. Although it looks like no consensuses have been reached yet, I encourage you to submit a patch to demonstrate your idea. Comments on such a high level can be quite vague and speculative. Code talks. Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8823.000.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650094#comment-14650094 ] Hadoop QA commented on HDFS-8220: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732230/HDFS-8220-HDFS-7285.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / ba90c02 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11883/console | This message was automatically generated. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan reassigned HDFS-8155: - Assignee: Jakob Homan (was: Kai Zheng) Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8840) Inconsistent log level practice
[ https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-8840: --- Status: Patch Available (was: Open) Inconsistent log level practice --- Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1, 2.5.2, 2.5.1, 2.6.0 Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-8840-00.patch In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby
[ https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649222#comment-14649222 ] Hudson commented on HDFS-8821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #262 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/262/]) HDFS-8821. Explain message Operation category X is not supported in state standby. Contributed by Gautam Gopalakrishnan. (harsh: rev c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Explain message Operation category X is not supported in state standby - Key: HDFS-8821 URL: https://issues.apache.org/jira/browse/HDFS-8821 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gautam Gopalakrishnan Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch There is one message specifically that causes many users to question the health of their HDFS cluster, namely Operation category READ/WRITE is not supported in state standby. HDFS-3447 is an attempt to lower the logging severity for StandbyException related messages but it is not resolved yet. So this jira is an attempt to explain this particular message so it appears less scary. The text is question 3.17 in the Hadoop Wiki FAQ ref: https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby
[ https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649229#comment-14649229 ] Hudson commented on HDFS-8821: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2219 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2219/]) HDFS-8821. Explain message Operation category X is not supported in state standby. Contributed by Gautam Gopalakrishnan. (harsh: rev c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java Explain message Operation category X is not supported in state standby - Key: HDFS-8821 URL: https://issues.apache.org/jira/browse/HDFS-8821 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gautam Gopalakrishnan Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch There is one message specifically that causes many users to question the health of their HDFS cluster, namely Operation category READ/WRITE is not supported in state standby. HDFS-3447 is an attempt to lower the logging severity for StandbyException related messages but it is not resolved yet. So this jira is an attempt to explain this particular message so it appears less scary. The text is question 3.17 in the Hadoop Wiki FAQ ref: https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8792) Improve BlockManager#postponedMisreplicatedBlocks and BlockManager#excessReplicateMap
[ https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649183#comment-14649183 ] Yi Liu commented on HDFS-8792: -- The test failure is not related. Improve BlockManager#postponedMisreplicatedBlocks and BlockManager#excessReplicateMap - Key: HDFS-8792 URL: https://issues.apache.org/jira/browse/HDFS-8792 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch {{LightWeightHashSet}} requires fewer memory than java hashset. Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of {{TreeMap}} instead, since no need to sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby
[ https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649274#comment-14649274 ] Hudson commented on HDFS-8821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2200 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2200/]) HDFS-8821. Explain message Operation category X is not supported in state standby. Contributed by Gautam Gopalakrishnan. (harsh: rev c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java Explain message Operation category X is not supported in state standby - Key: HDFS-8821 URL: https://issues.apache.org/jira/browse/HDFS-8821 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gautam Gopalakrishnan Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch There is one message specifically that causes many users to question the health of their HDFS cluster, namely Operation category READ/WRITE is not supported in state standby. HDFS-3447 is an attempt to lower the logging severity for StandbyException related messages but it is not resolved yet. So this jira is an attempt to explain this particular message so it appears less scary. The text is question 3.17 in the Hadoop Wiki FAQ ref: https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8844) TestHDFSCLI does not cleanup the test directory
Akira AJISAKA created HDFS-8844: --- Summary: TestHDFSCLI does not cleanup the test directory Key: HDFS-8844 URL: https://issues.apache.org/jira/browse/HDFS-8844 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Akira AJISAKA Priority: Minor If TestHDFSCLI is executed twice without {{mvn clean}}, the second try fails. Here are the failing test cases: {noformat} 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(231)) - Failing tests: 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(232)) - -- 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 226: get: getting non existent(absolute path) 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 227: get: getting non existent file(relative path) 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 228: get: Test for hdfs:// path - getting non existent 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 229: get: Test for Namenode's path - getting non existent 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 250: copyToLocal: non existent relative path 2015-07-31 21:35:17,654 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 251: copyToLocal: non existent absolute path 2015-07-31 21:35:17,655 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 252: copyToLocal: Test for hdfs:// path - non existent file/directory 2015-07-31 21:35:17,655 [main] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(238)) - 253: copyToLocal: Test for Namenode's path - non existent file/directory {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8840) Inconsistent log level practice
[ https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-8840: --- Attachment: HDFS-8840-00.patch Uploaded the patch please review Inconsistent log level practice --- Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1 Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-8840-00.patch In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-8784: --- Attachment: HDFS-8784-00.patch Attached the patch .please review BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8802) dfs.checksum.type is not described in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649205#comment-14649205 ] Gururaj Shetty commented on HDFS-8802: -- The test failure {{org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClientPeerWriteTimeout}} is handled in jira HDFS-8812 so can ignore the same. [~ozawa] kindly review the attached patch and let me know for any changes. dfs.checksum.type is not described in hdfs-default.xml -- Key: HDFS-8802 URL: https://issues.apache.org/jira/browse/HDFS-8802 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.1 Reporter: Tsuyoshi Ozawa Assignee: Gururaj Shetty Attachments: HDFS-8802.patch, HDFS-8802_01.patch, HDFS-8802_02.patch It's a good timing to check other configurations about hdfs-default.xml here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local
[ https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649230#comment-14649230 ] Hudson commented on HDFS-7192: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2219 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2219/]) HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. (Contributed by Arpit Agarwal) (arp: rev 88d8736ddeff10a03acaa99a9a0ee99dcfabe590) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java DN should ignore lazyPersist hint if the writer is not local Key: HDFS-7192 URL: https://issues.apache.org/jira/browse/HDFS-7192 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch The DN should ignore {{allowLazyPersist}} hint to {{DataTransferProtocol#writeBlock}} if the writer is not local. Currently we don't restrict memory writes to local clients. For in-cluster clients this is not an issue as single replica writes default to the local DataNode. But clients outside the cluster can still send this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local
[ https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649223#comment-14649223 ] Hudson commented on HDFS-7192: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #262 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/262/]) HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. (Contributed by Arpit Agarwal) (arp: rev 88d8736ddeff10a03acaa99a9a0ee99dcfabe590) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java DN should ignore lazyPersist hint if the writer is not local Key: HDFS-7192 URL: https://issues.apache.org/jira/browse/HDFS-7192 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch The DN should ignore {{allowLazyPersist}} hint to {{DataTransferProtocol#writeBlock}} if the writer is not local. Currently we don't restrict memory writes to local clients. For in-cluster clients this is not an issue as single replica writes default to the local DataNode. But clients outside the cluster can still send this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-8784: --- Status: Patch Available (was: Open) BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8840) Inconsistent log level practice
[ https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649245#comment-14649245 ] songwanging commented on HDFS-8840: --- Great, the patch looks good to me. It should be accepted. Inconsistent log level practice --- Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1 Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-8840-00.patch In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local
[ https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649275#comment-14649275 ] Hudson commented on HDFS-7192: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2200 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2200/]) HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. (Contributed by Arpit Agarwal) (arp: rev 88d8736ddeff10a03acaa99a9a0ee99dcfabe590) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java DN should ignore lazyPersist hint if the writer is not local Key: HDFS-7192 URL: https://issues.apache.org/jira/browse/HDFS-7192 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch The DN should ignore {{allowLazyPersist}} hint to {{DataTransferProtocol#writeBlock}} if the writer is not local. Currently we don't restrict memory writes to local clients. For in-cluster clients this is not an issue as single replica writes default to the local DataNode. But clients outside the cluster can still send this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local
[ https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649380#comment-14649380 ] Hudson commented on HDFS-7192: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #270 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/270/]) HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. (Contributed by Arpit Agarwal) (arp: rev 88d8736ddeff10a03acaa99a9a0ee99dcfabe590) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java DN should ignore lazyPersist hint if the writer is not local Key: HDFS-7192 URL: https://issues.apache.org/jira/browse/HDFS-7192 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch The DN should ignore {{allowLazyPersist}} hint to {{DataTransferProtocol#writeBlock}} if the writer is not local. Currently we don't restrict memory writes to local clients. For in-cluster clients this is not an issue as single replica writes default to the local DataNode. But clients outside the cluster can still send this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8840) Inconsistent log level practice
[ https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649362#comment-14649362 ] Hadoop QA commented on HDFS-8840: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 10s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 14s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 31s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 138m 14s | Tests failed in hadoop-hdfs. | | | | 184m 33s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748164/HDFS-8840-00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 93d50b7 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11877/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11877/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11877/console | This message was automatically generated. Inconsistent log level practice --- Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1 Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-8840-00.patch In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby
[ https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649379#comment-14649379 ] Hudson commented on HDFS-8821: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #270 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/270/]) HDFS-8821. Explain message Operation category X is not supported in state standby. Contributed by Gautam Gopalakrishnan. (harsh: rev c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Explain message Operation category X is not supported in state standby - Key: HDFS-8821 URL: https://issues.apache.org/jira/browse/HDFS-8821 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gautam Gopalakrishnan Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch There is one message specifically that causes many users to question the health of their HDFS cluster, namely Operation category READ/WRITE is not supported in state standby. HDFS-3447 is an attempt to lower the logging severity for StandbyException related messages but it is not resolved yet. So this jira is an attempt to explain this particular message so it appears less scary. The text is question 3.17 in the Hadoop Wiki FAQ ref: https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8840) Inconsistent log level practice
[ https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8840: - Resolution: Not A Problem Status: Resolved (was: Patch Available) Thanks [~jagadesh.kiran] for working on this. The original logic looks correct to me. The goal is to log a fatal error when catching IOException. If debug is enabled, the fatal log will include additional exception information. I will resolve this as not a problem. Please reopen if you disagree. Inconsistent log level practice --- Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1 Reporter: songwanging Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-8840-00.patch In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649618#comment-14649618 ] Andrew Wang commented on HDFS-8833: --- Supporting reencode-on-rename is difficult, and IMO more difficult than what the Mover does for HSM, which is why it's not scoped for phase 1 and we're trying to avoid sticking strictly to current StoragePolicy semantics. However, if we later add support for reencode-on-rename, we can compatibly add an inherit mode by setting the behavior-on-create on the parent directory. e.g. if you set the dir to inherit-on-create, files would set their policy to inherit. Else if set to parent-on-create, they would explicitly set it to the parent's policy. I also think the APIs are not that dissimilar; as I said above, the proposal for EC is essentially SP without an inherit mode. Alternatively, you can think of it as files always having an explicit SP set rather than inherit. We could even completely integrate EC into SP if we add behavior-on-create to the SP framework. We could allow setting SP on dirs with either behavior (pretty easy change), and only allow creating dirs with a parent-on-create policy for now. Thoughts? Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649514#comment-14649514 ] Zhe Zhang commented on HDFS-8784: - Thanks Jagadesh for working on this! Looks like a clean refactor. Could you also update the Javadoc of the method? BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649486#comment-14649486 ] Xiaoyu Yao commented on HDFS-8747: -- Thanks [~andrew.wang] for reviewing. bq. Have you thought about simply allowing rename between EZs with the same settings? This would be a much smaller and easier change with similar properties. Your proposal I think is still better in terms of ease-of-use and also ensuring security invariants around key rolling (if/when we implement that). Yes. We've discussed this simpler work around. But there are many limitations such as security invariants you mentioned above. We don't want to limit different EZs to share the same zone key just to support rename as they may have different policies. Encryption zone as a security concept should be managed consistently with a single entity. Based on that, support adding additional roots to encryption zone is a natural enhancement and better solution. bq. If we keep the APIs superuser-only, how does a normal user add their trash folder to an EZ? Same for scratch folders, e.g. if the Hive user is not a superuser. I think we should keep this API as superuser-only. It can still be useful even though we keep it as superuser only. The trash folder/scratch folder can be per-created and added to encryption zone by super user as needed. This removes the limitation for hive scratch folder, which currently has to be configured under the single root of the encryption zone. We can discuss more on this for HDFS-8831. Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones -- Key: HDFS-8747 URL: https://issues.apache.org/jira/browse/HDFS-8747 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, HDFS-8747-07292015.pdf HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create encryption zone on top of a single HDFS directory. Files under the root directory of the encryption zone will be encrypted/decrypted transparently upon HDFS client write or read operations. Generally, it does not support rename(without data copying) across encryption zones or between encryption zone and non-encryption zone because different security settings of encryption zones. However, there are certain use cases where efficient rename support is desired. This JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones. “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support. Temporary files from MR jobs are usually stored in staging area outside encryption zone such as “/tmp” directory and then rename to targeted directories as specified once the data is ready to be further processed. Below is a summary of supported/unsupported cases from latest Hadoop: * Rename within the encryption zone is supported * Rename the entire encryption zone by moving the root directory of the zone is allowed. * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed. * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed. * Rename from non-encryption zone to encryption zone is not allowed. “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's home directory instead of being deleted. Deleted files are initially moved (renamed) to the Current sub-directory of the .Trash directory with original path being preserved. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory. Due to the limited rename support, delete sub-directory/file within encryption zone with trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved the error message but without a complete solution to the problem. We propose to solve the problem by generalizing the mapping between encryption zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped directories such as scratch space or soft delete trash locations to be added/removed dynamically after creation. This way, rename for scratch space and soft delete can be better supported without breaking the assumption that rename is only
[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-6860: - Attachment: HDFS-6860.00.patch The original patch does not apply after switching to slf4j with HDFS-7112. I rebase and fix some missing ones in the original patch. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
[ https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649490#comment-14649490 ] Akira AJISAKA commented on HDFS-7916: - If HDFS-7704 is backported to a branch, this issue should be backported to the same branch as well. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop -- Key: HDFS-7916 URL: https://issues.apache.org/jira/browse/HDFS-7916 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0 Reporter: Vinayakumar B Assignee: Rushabh S Shah Priority: Critical Fix For: 2.7.1 Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch if any badblock found, then BPSA for StandbyNode will go for infinite times to report it. {noformat}2015-03-11 19:43:41,528 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: stobdtserver3/10.224.54.70:18010 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8833: Summary: Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones (was: Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649566#comment-14649566 ] Tsz Wo Nicholas Sze commented on HDFS-8833: --- {code} ... Under the scope of this JIRA, the file's EC policy won't be changed. If it was created under EC zone A it will carry EC policy A with it when being moved. Could you explain a bit more why If yes, we could eliminate EC zones. Otherwise, we should keep EC zone.? {code} This is semantic different from StoragePolicy. We should use the same semantic as StoragePolicy. Let's keep EC zone for the moment. {code} As a follow-on we could enable an inherit mode similar as StoragePolicy. {code} No, we cannot change semantic over time. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8653) Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649807#comment-14649807 ] Zhe Zhang commented on HDFS-8653: - [~szetszwo] Majority of this patch is just code cleanup, such as removing unnecessary type info when creating generics {{ArrayList}}. Only logics change is to add a few {{null}} checker in {{DataodeStorageInfo}} and it was from HDFS-8323. Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo Key: HDFS-8653 URL: https://issues.apache.org/jira/browse/HDFS-8653 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: 2.8.0 Attachments: HDFS-8653.00.patch While updating the {{blockmanagement}} module to distribute erasure coding recovery work to Datanode, the HDFS-7285 branch also did some code cleanup that should be merged into trunk independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end striping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-8202: --- Summary: Improve end to end striping file test to add erasure recovering test (was: Improve end to end stirpping file test to add erasure recovering test) Improve end to end striping file test to add erasure recovering test Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Fix For: HDFS-7285 Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649686#comment-14649686 ] Xiaoyu Yao commented on HDFS-6860: -- Thanks [~arpitagarwal] for the review. Do you mean keep the processReport related log at *INFO* level? BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-8845: --- Attachment: HDFS-8845.patch DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-8845: --- Status: Patch Available (was: Open) DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649747#comment-14649747 ] Arpit Agarwal commented on HDFS-6860: - +1 pending Jenkins thanks for taking over this [~xyao]. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649670#comment-14649670 ] Arpit Agarwal commented on HDFS-6860: - Thanks [~xyao], we should probably leave the Processing first storage report for and the processReport messages at DEBUG. Those are logged once per DN per block report and useful in practice. . BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-8202. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Target Version/s: HDFS-7285 +1 on the latest patch. I just committed to the branch. Thanks Xinwei for the contribution! Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Fix For: HDFS-7285 Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-8845: --- Description: DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-6860: - Attachment: HDFS-6860.01.patch Update the patch based on feedback. Delta from v00: Keep the block report processing related log at INFO level. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649748#comment-14649748 ] Lei (Eddy) Xu commented on HDFS-8845: - Hi, [~lichangleo], after HDFS-6482, finalizedDir has two-level of subdirs ({{finalized/subdir0/subdir23/blk_1234}}). Would this change lose the coverage of checkDir() on these subdirs? DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649788#comment-14649788 ] Tsz Wo Nicholas Sze commented on HDFS-8838: --- Thanks Li for taking a look 1. We don't retry connecting to a datanode for the same datanode. So, let's keep if for the moment. If necessary, we can change it later on. 2. The length is included in the path and printed out in the log. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649805#comment-14649805 ] Andrew Wang commented on HDFS-8747: --- bq. Encryption zone as a security concept should be managed consistently with a single entity. Based on that, support adding additional roots to encryption zone is a natural enhancement and better solution. SGTM, definitely like the idea of the EZ as a management unit. bq. The trash folder/scratch folder can be per-created and added to encryption zone by super user as needed This is maybe viable for scratch, but not for trash. There can be many users on a cluster accessing a variety of EZs, such that it's unmanageable for the super-user to set up all the Trash folders beforehand. Another question, how would this work if a user's homedir is already an EZ? Do you plan to add support for nested encryption zones? Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones -- Key: HDFS-8747 URL: https://issues.apache.org/jira/browse/HDFS-8747 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, HDFS-8747-07292015.pdf HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create encryption zone on top of a single HDFS directory. Files under the root directory of the encryption zone will be encrypted/decrypted transparently upon HDFS client write or read operations. Generally, it does not support rename(without data copying) across encryption zones or between encryption zone and non-encryption zone because different security settings of encryption zones. However, there are certain use cases where efficient rename support is desired. This JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones. “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support. Temporary files from MR jobs are usually stored in staging area outside encryption zone such as “/tmp” directory and then rename to targeted directories as specified once the data is ready to be further processed. Below is a summary of supported/unsupported cases from latest Hadoop: * Rename within the encryption zone is supported * Rename the entire encryption zone by moving the root directory of the zone is allowed. * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed. * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed. * Rename from non-encryption zone to encryption zone is not allowed. “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's home directory instead of being deleted. Deleted files are initially moved (renamed) to the Current sub-directory of the .Trash directory with original path being preserved. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory. Due to the limited rename support, delete sub-directory/file within encryption zone with trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved the error message but without a complete solution to the problem. We propose to solve the problem by generalizing the mapping between encryption zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped directories such as scratch space or soft delete trash locations to be added/removed dynamically after creation. This way, rename for scratch space and soft delete can be better supported without breaking the assumption that rename is only supported within the zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649825#comment-14649825 ] Andrew Wang commented on HDFS-6682: --- Wondering if there's a lighterweight metric we could compute instead. [~aw] is this the entire queue being backed up, or a few super-old replicas that never get cleared? If it's the entire queue, maybe the rate of addition/removal from UnderReplicatedBlocks would be similarly useful, in addition to total size. Could provide sliding window metrics like NNTop. Doing this per-DN could also be interesting. Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: metrics Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649697#comment-14649697 ] Arpit Agarwal commented on HDFS-6860: - Thanks I meant INFO. Apologize for not catching this in the earlier review. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated HDFS-8828: --- Status: Patch Available (was: Open) Submitted patch rev 001. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649808#comment-14649808 ] Chang Li commented on HDFS-8845: Hi [~eddyxu], thanks for comments. It's intentionally done for performance by a little bit trade off. DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
[ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7587: -- Attachment: HDFS-7587-branch-2.6.patch For the 2.6.1 release effort, the backport isn't straightforward due to difference between 2.6 and 2.7. It has the following differences compared to the original patch. * Include part of HDFS-7509 so that prepareFileForWrite has the expected function signature. * Use Quota.Counts instead of QuotaCounts which is introduced in HDFS-7584. * Skip the check for storage type specific quota introduced in HDFS-7584. * Add the necessary definitions for INodesPath#length and FSDirectory#shouldSkipQuotaChecks. Edit log corruption can happen if append fails with a quota violation - Key: HDFS-7587 URL: https://issues.apache.org/jira/browse/HDFS-7587 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Kihwal Lee Assignee: Jing Zhao Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7587-branch-2.6.patch, HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.003.patch, HDFS-7587.patch We have seen a standby namenode crashing due to edit log corruption. It was complaining that {{OP_CLOSE}} cannot be applied because the file is not under-construction. When a client was trying to append to the file, the remaining space quota was very small. This caused a failure in {{prepareFileForWrite()}}, but after the inode was already converted for writing and a lease added. Since these were not undone when the quota violation was detected, the file was left in under-construction with an active lease without edit logging {{OP_ADD}}. A subsequent {{append()}} eventually caused a lease recovery after the soft limit period. This resulted in {{commitBlockSynchronization()}}, which closed the file with {{OP_CLOSE}} being logged. Since there was no corresponding {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8845) DiskChecker should not traverse entire tree
Chang Li created HDFS-8845: -- Summary: DiskChecker should not traverse entire tree Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8838: -- Attachment: h8838_20150731.patch h8838_20150731.patch: adds more tests and prints out all lengths. Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch, h8838_20150731.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8846) Create edit log files with old layout version for upgrade testing
Zhe Zhang created HDFS-8846: --- Summary: Create edit log files with old layout version for upgrade testing Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649905#comment-14649905 ] Xiaoyu Yao commented on HDFS-8747: -- bq. This is maybe viable for scratch, but not for trash. There can be many users on a cluster accessing a variety of EZs, such that it's unmanageable for the super-user to set up all the Trash folders beforehand. Three solutions have been discussed in Design-Soft Delete section of the spec. My initial take is on Option 1: Per User Trash Namespace, which is mostly for compatibility and simplicity. If pre-create trash folder for many users is a concern, Option 2: Global Trash Namespace which is similar to the idea proposed in Hadoop-7310 can be used. It will not be compatible with current Trash behavior where users find their deleted files under /user/username/.Trash/Current/ These solutions can be implemented as pluggable trash policy for admin to choose with configurable keys when the default one may not be appropriate for their deployment. bq. Another question, how would this work if a user's homedir is already an EZ? Do you plan to add support for nested encryption zones? No we don't plan to support nested encryption zones. If we take Option 1, this will not be supported. But if we take Option 2, it will not be a problem as the trash namespace for encryption zone will be separated from user's homedir. Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones -- Key: HDFS-8747 URL: https://issues.apache.org/jira/browse/HDFS-8747 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, HDFS-8747-07292015.pdf HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create encryption zone on top of a single HDFS directory. Files under the root directory of the encryption zone will be encrypted/decrypted transparently upon HDFS client write or read operations. Generally, it does not support rename(without data copying) across encryption zones or between encryption zone and non-encryption zone because different security settings of encryption zones. However, there are certain use cases where efficient rename support is desired. This JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones. “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support. Temporary files from MR jobs are usually stored in staging area outside encryption zone such as “/tmp” directory and then rename to targeted directories as specified once the data is ready to be further processed. Below is a summary of supported/unsupported cases from latest Hadoop: * Rename within the encryption zone is supported * Rename the entire encryption zone by moving the root directory of the zone is allowed. * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed. * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed. * Rename from non-encryption zone to encryption zone is not allowed. “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's home directory instead of being deleted. Deleted files are initially moved (renamed) to the Current sub-directory of the .Trash directory with original path being preserved. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory. Due to the limited rename support, delete sub-directory/file within encryption zone with trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved the error message but without a complete solution to the problem. We propose to solve the problem by generalizing the mapping between encryption zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped directories such as scratch space or soft delete trash locations to be added/removed dynamically after creation. This way, rename for scratch space and soft delete can be better supported without breaking the assumption that rename is only supported within the zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8550) Erasure Coding: Fix FindBugs Multithreaded correctness Warning
[ https://issues.apache.org/jira/browse/HDFS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649929#comment-14649929 ] Zhe Zhang commented on HDFS-8550: - [~rakeshr] I wonder if the issue is still valid with HDFS-8386? Erasure Coding: Fix FindBugs Multithreaded correctness Warning -- Key: HDFS-8550 URL: https://issues.apache.org/jira/browse/HDFS-8550 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Findbug warning:- Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.hdfs.DFSOutputStream Field org.apache.hadoop.hdfs.DFSOutputStream.streamer Synchronized 89% of the time Unsynchronized access at DFSOutputStream.java:[line 146] Unsynchronized access at DFSOutputStream.java:[line 859] Unsynchronized access at DFSOutputStream.java:[line 627] Unsynchronized access at DFSOutputStream.java:[line 630] Unsynchronized access at DFSOutputStream.java:[line 640] Unsynchronized access at DFSOutputStream.java:[line 342] Unsynchronized access at DFSOutputStream.java:[line 744] Unsynchronized access at DFSOutputStream.java:[line 903] Synchronized access at DFSOutputStream.java:[line 737] Synchronized access at DFSOutputStream.java:[line 913] Synchronized access at DFSOutputStream.java:[line 726] Synchronized access at DFSOutputStream.java:[line 756] Synchronized access at DFSOutputStream.java:[line 762] Synchronized access at DFSOutputStream.java:[line 757] Synchronized access at DFSOutputStream.java:[line 758] Synchronized access at DFSOutputStream.java:[line 762] Synchronized access at DFSOutputStream.java:[line 483] Synchronized access at DFSOutputStream.java:[line 486] Synchronized access at DFSOutputStream.java:[line 717] Synchronized access at DFSOutputStream.java:[line 719] Synchronized access at DFSOutputStream.java:[line 722] Synchronized access at DFSOutputStream.java:[line 408] Synchronized access at DFSOutputStream.java:[line 408] Synchronized access at DFSOutputStream.java:[line 423] Synchronized access at DFSOutputStream.java:[line 426] Synchronized access at DFSOutputStream.java:[line 411] Synchronized access at DFSOutputStream.java:[line 452] Synchronized access at DFSOutputStream.java:[line 452] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 670] Synchronized access at DFSOutputStream.java:[line 580] Synchronized access at DFSOutputStream.java:[line 574] Synchronized access at DFSOutputStream.java:[line 592] Synchronized access at DFSOutputStream.java:[line 583] Synchronized access at DFSOutputStream.java:[line 581] Synchronized access at DFSOutputStream.java:[line 621] Synchronized access at DFSOutputStream.java:[line 609] Synchronized access at DFSOutputStream.java:[line 621] Synchronized access at DFSOutputStream.java:[line 597] Synchronized access at DFSOutputStream.java:[line 612] Synchronized access at DFSOutputStream.java:[line 597] Synchronized access at DFSOutputStream.java:[line 588] Synchronized access at DFSOutputStream.java:[line 624] Synchronized access at DFSOutputStream.java:[line 612] Synchronized access at DFSOutputStream.java:[line 588] Synchronized access at DFSOutputStream.java:[line 632] Synchronized access at DFSOutputStream.java:[line 632] Synchronized access at DFSOutputStream.java:[line 616] Synchronized access at DFSOutputStream.java:[line 633] Synchronized access at DFSOutputStream.java:[line 657] Synchronized access at DFSOutputStream.java:[line 658] Synchronized access at DFSOutputStream.java:[line 695] Synchronized access at DFSOutputStream.java:[line 698] Synchronized access at DFSOutputStream.java:[line 784] Synchronized access at DFSOutputStream.java:[line 795] Synchronized access at DFSOutputStream.java:[line 801] Synchronized access at DFSOutputStream.java:[line 155] Synchronized access at DFSOutputStream.java:[line 158] Synchronized access at DFSOutputStream.java:[line 433] Synchronized access at DFSOutputStream.java:[line 886] Synchronized access at DFSOutputStream.java:[line 463] Synchronized access at DFSOutputStream.java:[line 469] Synchronized access at DFSOutputStream.java:[line 463] Synchronized access at DFSOutputStream.java:[line 470] Synchronized access at DFSOutputStream.java:[line 465] Synchronized access at DFSOutputStream.java:[line 749] Synchronized access at DFSStripedOutputStream.java:[line 260] Synchronized access at DFSStripedOutputStream.java:[line 325] Synchronized access at DFSStripedOutputStream.java:[line 325]
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649952#comment-14649952 ] Haohui Mai commented on HDFS-8344: -- After looking through the code, instead of retrying for n times, a better approach might be set a timeout instead of retrying n times during lease recovery. It might be possible that multiple clients can try to recover the leases and quickly use up all the numbers of retries, causing the file to be closed too quickly. That way the whole lease recovery process is bounded by time (in addition to SOFT_LIMIT and HARD_LIMIT we have today). And it also can guarantee that the lease recovery process always terminates. Thoughts? NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 2.8.0 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649499#comment-14649499 ] Haohui Mai commented on HDFS-6407: -- The v10 patch allows sorting based on the status and the name of the data node. [~benoyantony], [~nroberts]. Does the patch look good to you? new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Haohui Mai Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting table.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649510#comment-14649510 ] Hadoop QA commented on HDFS-8784: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 5s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 26s | The applied patch generated 1 new checkstyle issues (total was 310, now 310). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 163m 23s | Tests passed in hadoop-hdfs. | | | | 209m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748168/HDFS-8784-00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 93d50b7 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11878/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11878/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11878/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11878/console | This message was automatically generated. BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N Attachments: HDFS-8784-00.patch The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649876#comment-14649876 ] Allen Wittenauer commented on HDFS-6682: We have no insight into how old a given replication might have been hanging around so no way to really answer that question. We know it gets backed up during cascading DN failure events (thanks very slow NM memory checker+fast acting bad job+Linux OOM killer!), so I was always under the impression that it's just the whole queue is super busy vs. old ones never cleared. Rate might be useful to at least tell us if it is stuck and/or a project on how long the queue will remain behind. Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: metrics Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649948#comment-14649948 ] Zhe Zhang commented on HDFS-8220: - Quickly glanced through the current code; it doesn't seem we are handling the identified case. Shall we resume the work? Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-4882: -- Labels: 2.6.1-candidate (was: ) Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3443) Fix NPE when namenode transition to active during startup by adding checkNNStartup() in NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-3443: -- Labels: 2.6.1-candidate (was: ) Fix NPE when namenode transition to active during startup by adding checkNNStartup() in NameNodeRpcServer - Key: HDFS-3443 URL: https://issues.apache.org/jira/browse/HDFS-3443 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover, ha Reporter: suja s Assignee: Vinayakumar B Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-3443-003.patch, HDFS-3443-004.patch, HDFS-3443-005.patch, HDFS-3443-006.patch, HDFS-3443-007.patch, HDFS-3443_1.patch, HDFS-3443_1.patch Start NN Let NN standby services be started. Before the editLogTailer is initialised start ZKFC and allow the activeservices start to proceed further. Here editLogTailer.catchupDuringFailover() will throw NPE. void startActiveServices() throws IOException { LOG.info(Starting services required for active state); writeLock(); try { FSEditLog editLog = dir.fsImage.getEditLog(); if (!editLog.isOpenForWrite()) { // During startup, we're already open for write during initialization. editLog.initJournalsForWrite(); // May need to recover editLog.recoverUnclosedStreams(); LOG.info(Catching up to latest edits from old active before + taking over writer role in edits logs.); editLogTailer.catchupDuringFailover(); {noformat} 2012-05-18 16:51:27,585 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from XX.XX.XX.55:58003: output error 2012-05-18 16:51:27,586 WARN org.apache.hadoop.ipc.Server: IPC Server handler 8 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from XX.XX.XX.55:58004: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:602) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-18 16:51:27,586 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2092) at org.apache.hadoop.ipc.Server.access$2000(Server.java:107) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:930) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:994) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1738) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface
[ https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649834#comment-14649834 ] Zhe Zhang commented on HDFS-8835: - Thanks for sharing the feedback [~szetszwo]. HDFS-8487 (as well as HDFS-8653, HDFS-8605) just try to divide-and-conquer the (inevitable) inconvenience for the community to understand and accept the EC change. I feel this way is easier than absorbing the huge change all at once. As shown below (copied from HDFS-8728 [discussion | https://issues.apache.org/jira/browse/HDFS-8728?focusedCommentId=14619043page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619043]) the overall EC change becomes much smaller and less intrusive after pushing these changes to trunk first (I will do the rebase after HDFS-8499 revert). {code} Current HDFS-7285: 2532 insertions(+), 1156 deletions(-) in blockmanagement 1826 insertions(+), 444 deletions(-) in namenode Rebased: 1251 insertions(+), 201 deletions(-) in blockmanagement 1324 insertions(+), 168 deletions(-) in namenode {code} That said, I understand that git rebasing is a relatively new workflow and people have different preferences in absorbing changes. So more feedbacks are very welcome. Convert BlockInfoUnderConstruction as an interface -- Key: HDFS-8835 URL: https://issues.apache.org/jira/browse/HDFS-8835 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8499, this JIRA aims to convert {{BlockInfoUnderConstruction}} as an interface and {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 branch will add {{BlockInfoStripedUnderConstruction}} as another implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface
[ https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649840#comment-14649840 ] Tsz Wo Nicholas Sze commented on HDFS-8835: --- HDFS-8487 (as well as HDFS-8653, HDFS-8605) just try to divide-and-conquer the (inevitable) inconvenience for the community to understand and accept the EC change. I feel this way is easier than absorbing the huge change all at once. ... Please don't do it anymore. We probably should revert all these patch. We should not sneak in branch code to trunk. The entire branch should be reviewed together. Convert BlockInfoUnderConstruction as an interface -- Key: HDFS-8835 URL: https://issues.apache.org/jira/browse/HDFS-8835 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8499, this JIRA aims to convert {{BlockInfoUnderConstruction}} as an interface and {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 branch will add {{BlockInfoStripedUnderConstruction}} as another implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8838: -- Attachment: (was: h8838_20150731.patch) Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface
[ https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649906#comment-14649906 ] Andrew Wang commented on HDFS-8835: --- There is a lot of past precedent for doing refactors in trunk. One of the first EC-related changes was HDFS-7743 which renamed BlockInfo to BlockInfoContiguous in trunk. [~szetszwo] you +1'd this change. The other BlockInfo refactors have happened over many weeks and been reviewed by a variety of different committers (Yi, Vinay, Jing, myself) so there has been no intent to sneak changes into trunk. Considering the number of positive reviews, I would say doing refactors in trunk has been met with general approval. bq. Patches got committed to trunk neither means that everyone already has understood the code Everyone understanding the code is not a prerequisite for getting code committed. Part of community over code is trusting the judgement of the other committers on the project. Here multiple committers have positively reviewed these refactors. bq. Quite a few people told me that the recent change of HDFS-8487 does make the code harder to understand. It makes the familiar code unfamiliar. Considering that many of us have positively reviewed these refactors, maybe harder to understand is a matter of opinion. Zhe posted about plans to further simplify the code through use of composition. Would this help with reviewing the sum of the changes? Maybe we should also continue to discuss the design of the hierarchy over on HDFS-8499. Convert BlockInfoUnderConstruction as an interface -- Key: HDFS-8835 URL: https://issues.apache.org/jira/browse/HDFS-8835 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Per discussion under HDFS-8499, this JIRA aims to convert {{BlockInfoUnderConstruction}} as an interface and {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 branch will add {{BlockInfoStripedUnderConstruction}} as another implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649960#comment-14649960 ] Hadoop QA commented on HDFS-6860: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 16s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 29s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 43s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 6s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 157m 31s | Tests failed in hadoop-hdfs. | | | | 204m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748221/HDFS-6860.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d0e0ba8 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11879/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11879/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11879/console | This message was automatically generated. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648802#comment-14648802 ] Li Bo commented on HDFS-8838: - hi, Nicholas I think you can commit your patch first and I will update mine after that. Some points : 1. {{DFSStripedOutputStream#getNumBlockWriteRetry}} returns 0, which allows connecting to datanode only one time. I think we should allow the connecting to be retied for several times. One way is to store the located block getting from {{locateFollowingBlock()}}, and the following retries will use the store one, no need to call {{locateFollowingBlock()}} again. 2. in {{TestDFSStripedOutputStreamWithFailure}}, you store the test length in {{LENGTHS}}. But when I read the code, I have to calculate the length by myself to see what kind the test is. So, how about adding some comments , or directly show the file length in parameter such as {{testDatanodeFailure(4* cellSize +123)}}? Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8840) Inconsistent log level practice
songwanging created HDFS-8840: - Summary: Inconsistent log level practice Key: HDFS-8840 URL: https://issues.apache.org/jira/browse/HDFS-8840 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1, 2.5.2, 2.5.1, 2.6.0 Reporter: songwanging Priority: Minor In method checkLogsAvailableForRead() of class: hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java The log level is not correct, after checking LOG.isDebugEnabled(), we should use LOG.debug(msg, e);, while now we use LOG.fatal(msg, e);. Log level is inconsistent. the source code of this method is: private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long curTxIdOnOtherNode) { ... } catch (IOException e) { ... if (LOG.isDebugEnabled()) { LOG.fatal(msg, e); } else { LOG.fatal(msg); } return false; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8842) Catch throwable
songwanging created HDFS-8842: - Summary: Catch throwable Key: HDFS-8842 URL: https://issues.apache.org/jira/browse/HDFS-8842 Project: Hadoop HDFS Issue Type: Bug Reporter: songwanging Priority: Critical We came across a few instances where the code catches Throwable, but fails to rethrow anything. Throwable is the parent type of Exception and Error, so catching Throwable means catching both Exceptions as well as Errors. An Exception is something you could recover (like IOException), an Error is something more serious and usually you could'nt recover easily (like ClassNotFoundError) so it doesn't make much sense to catch an Error. We should convert Throwable to Exception. For example: In method tryGetPid(Process p) of class: hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\ha\ShellCommandFencer.java code: private static String tryGetPid(Process p) { try { ... } catch (Throwable t) { LOG.trace(Unable to determine pid for + p, t); return null; } } In method uncaughtException(Thread t, Throwable e) of class: hadoop-2.7.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-common\src\main\java\org\apache\hadoop\yarn\YarnUncaughtExceptionHandler.java code: public void uncaughtException(Thread t, Throwable e) { ... try { LOG.fatal(Thread + t + threw an Error. Shutting down now..., e); } catch (Throwable err) { //We don't want to not exit because of an issue with logging } ... try { System.err.println(Halting due to Out Of Memory Error...); } catch (Throwable err) { //Again we done want to exit because of logging issues. } ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648840#comment-14648840 ] Akira AJISAKA commented on HDFS-6682: - bq. We have many ways to know about namenode health or in heavy load. This metric is to show the health not only for NameNode but also for the entire HDFS cluster. Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: metrics Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648858#comment-14648858 ] Zhe Zhang commented on HDFS-8833: - Thanks for the discussions guys! [~walter.k.su] Good catch that we are still storing EC policy at directory level. However, a directory is no longer a zone, based on the expected properties of a _zone_, as Nicholas [summarized | https://issues.apache.org/jira/browse/HDFS-8833?focusedCommentId=14648073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14648073]. I'll update the JIRA summary soon. I like the hybrid solution Andrew proposed. Looks like a good long term solution. [~vinayrpet] Let me know if it addresses the memory overhead concern you commented on. bq. What is the semantic of moving a file under EC zone A to EC zone B? Would the file be changed from EC scheme A to EC schema B? If yes, we could eliminate EC zones. Otherwise, we should keep EC zone. Thanks for the example Nicholas. Under the scope of this JIRA, the file's EC policy won't be changed. If it was created under EC zone A it will carry EC policy A with it when being moved. Could you explain a bit more why If yes, we could eliminate EC zones. Otherwise, we should keep EC zone.? As a follow-on we could enable an inherit mode similar as StoragePolicy. Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8841) Catch throwable return null
songwanging created HDFS-8841: - Summary: Catch throwable return null Key: HDFS-8841 URL: https://issues.apache.org/jira/browse/HDFS-8841 Project: Hadoop HDFS Issue Type: Bug Reporter: songwanging Priority: Minor In method map of class: \hadoop-2.7.1-src\hadoop-tools\hadoop-extras\src\main\java\org\apache\hadoop\tools\DistCpV1.java. This method has this code: public void map(LongWritable key, FilePair value, OutputCollectorWritableComparable?, Text out, Reporter reporter) throws IOException { ... } catch (Throwable ex) { // ignore, we are just cleaning up LOG.debug(Ignoring cleanup exception, ex); } } } ... } Throwable is the parent type of Exception and Error, so catching Throwable means catching both Exceptions as well as Errors. An Exception is something you could recover (like IOException), an Error is something more serious and usually you could'nt recover easily (like ClassNotFoundError) so it doesn't make much sense to catch an Error. We should convert to catch Exception instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648825#comment-14648825 ] Jagadesh Kiran N commented on HDFS-8784: Hi [~kanaka] as discussed iam assigning this to me BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Jagadesh Kiran N The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local
[ https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649131#comment-14649131 ] Hudson commented on HDFS-7192: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1003 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1003/]) HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. (Contributed by Arpit Agarwal) (arp: rev 88d8736ddeff10a03acaa99a9a0ee99dcfabe590) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java DN should ignore lazyPersist hint if the writer is not local Key: HDFS-7192 URL: https://issues.apache.org/jira/browse/HDFS-7192 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch The DN should ignore {{allowLazyPersist}} hint to {{DataTransferProtocol#writeBlock}} if the writer is not local. Currently we don't restrict memory writes to local clients. For in-cluster clients this is not an issue as single replica writes default to the local DataNode. But clients outside the cluster can still send this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby
[ https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649130#comment-14649130 ] Hudson commented on HDFS-8821: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1003 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1003/]) HDFS-8821. Explain message Operation category X is not supported in state standby. Contributed by Gautam Gopalakrishnan. (harsh: rev c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Explain message Operation category X is not supported in state standby - Key: HDFS-8821 URL: https://issues.apache.org/jira/browse/HDFS-8821 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gautam Gopalakrishnan Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch There is one message specifically that causes many users to question the health of their HDFS cluster, namely Operation category READ/WRITE is not supported in state standby. HDFS-3447 is an attempt to lower the logging severity for StandbyException related messages but it is not resolved yet. So this jira is an attempt to explain this particular message so it appears less scary. The text is question 3.17 in the Hadoop Wiki FAQ ref: https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.
zhihai xu created HDFS-8847: --- Summary: change TestHDFSContractAppend to not override testRenameFileBeingAppended method. Key: HDFS-8847 URL: https://issues.apache.org/jira/browse/HDFS-8847 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu change TestHDFSContractAppend to not override testRenameFileBeingAppended method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8829) DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning
[ https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650141#comment-14650141 ] He Tianyi commented on HDFS-8829: - Hi kanaka kumar avvaru, I've applied the improvement to my cluster, and should be able to manage to produce a patch in next few days. Can I work on this? DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning --- Key: HDFS-8829 URL: https://issues.apache.org/jira/browse/HDFS-8829 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.3.0, 2.6.0 Reporter: He Tianyi Assignee: kanaka kumar avvaru {code:java} private void initDataXceiver(Configuration conf) throws IOException { // find free port or use privileged port provided TcpPeerServer tcpPeerServer; if (secureResources != null) { tcpPeerServer = new TcpPeerServer(secureResources); } else { tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout, DataNode.getStreamingAddr(conf)); } tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); {code} The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on some system. Shall we make this behavior configurable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8245) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log.
[ https://issues.apache.org/jira/browse/HDFS-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-8245: --- Labels: 2.6.1-candidate BB2015-05-TBR (was: BB2015-05-TBR) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log. --- Key: HDFS-8245 URL: https://issues.apache.org/jira/browse/HDFS-8245 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Labels: 2.6.1-candidate, BB2015-05-TBR Fix For: 2.7.1 Attachments: HDFS-8245-1.patch, HDFS-8245.patch The following series of events happened on Standby namenode : 2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode Active Namenode (ANN) 2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode ANN The following series of events happened on Active Namenode:, 2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN) 2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from SNN The following series of events happened on datanode ( {color:red} datanodeA {color}): 2015-04-09 07:52:15,817 \[DataXceiver for client DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /:51078 \[Receiving block BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Receiving BP-595383232--1360869396230:blk_1570321882_1102029183867 src: /client:51078 dest: /{color:red}datanodeA:1004{color} 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: /client:51078, dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: HDFS_WRITE, cliID: DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: BP-595383232--1360869396230:blk_1570321882_1102029183867, duration: 148948385 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: BP-595383232--1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red}datanodeB {color}:52827 \[Copying block BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Copied BP-595383232--1360869396230:blk_1570321882_1102029183867 to {color:red}datanodeB{color}:52827 2015-04-09 07:52:28,187 \[DataNode: heartbeating to ANN:8020] INFO impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file path/blk_1570321882 for deletion 2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO impl.FsDatasetAsyncDiskService: Deleted BP-595383232--1360869396230 blk_1570321882_1102029183867 file path/blk_1570321882 Then we failover for upgrade and then the standby became active. When we did ls command on this file, we got the following exception: 15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, remote={color:red}datanodeA:1004{color}, for file filename, for pool BP-595383232--1360869396230 block 1570321882_1102029183867 at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97) at
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-7980: --- Labels: 2.6.1-candidate (was: ) Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649970#comment-14649970 ] Xiaoyu Yao commented on HDFS-6860: -- Jenkins results: * No unit test added because this is a log level only change. * Test failure is unrelated and tracked by know JIRAs: HDFS-8772. Thanks [~lichangleo] for the initial patch and [~arpitagarwal], [~andrew.wang] for the review. I commit the patch shortly. BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8404) Pending block replication can get stuck using older genstamp
[ https://issues.apache.org/jira/browse/HDFS-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-8404: --- Labels: 2.6.1-candidate (was: ) Pending block replication can get stuck using older genstamp Key: HDFS-8404 URL: https://issues.apache.org/jira/browse/HDFS-8404 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0, 2.7.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-8404-v0.patch, HDFS-8404-v1.patch If an under-replicated block gets into the pending-replication list, but later the genstamp of that block ends up being newer than the one originally submitted for replication, the block will fail replication until the NN is restarted. It will be safer if processPendingReplications() gets up-to-date blockinfo before resubmitting replication work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650013#comment-14650013 ] Zhe Zhang commented on HDFS-8823: - Thanks Haohui for the pointers; they are very helpful. I commented on {{storagePolicy}} just because if we plan to store it in BM too, the combined mem overhead ({{rep factor}} + {{storagePolicy}}) probably won't be (as easily) absorbed by alignment. And I don't think we'll end up having {{rep factor}} in BM but not {{storagePolicy}} (pls correct me if I'm wrong). Looks like BM needs both pieces of info to make correct placement decision. Given that the majority of blocks will have default {{rep factor}} and {{storagePolicy}}, maybe we can use some deduplication. For example, create a {{CustomizedBlockPolicies}} feature class and only add it to a {{BlockInfo}} when policies are customized. Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8823.000.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8827) Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor
[ https://issues.apache.org/jira/browse/HDFS-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649980#comment-14649980 ] Zhe Zhang commented on HDFS-8827: - Thanks for identifying the problem Fukudome-san! I took a quick look and the root cause doesn't seem straightforward. Do you mind creating a unit test generating the issue so we can all debug on the same basis? Thanks much! Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor -- Key: HDFS-8827 URL: https://issues.apache.org/jira/browse/HDFS-8827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Takuya Fukudome Assignee: Takuya Fukudome Attachments: processing-over-replica-npe.log In our test cluster, when namenode processed over replicated striped blocks, null pointer exception(NPE) occurred. This happened under below situation: 1) some datanodes shutdown. 2) namenode recovers block group which lost internal blocks. 3) restart the stopped datanodes. 4) namenode processes over replicated striped blocks. 5) NPE occurs I think BlockPlacementPolicyDefault#chooseReplicaToDelete will return null in this situation which causes this NPE problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650001#comment-14650001 ] Hadoop QA commented on HDFS-8845: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 21s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 2s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 42s | Tests failed in hadoop-hdfs. | | | | 203m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748230/HDFS-8845.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d0e0ba8 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11880/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11880/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11880/console | This message was automatically generated. DiskChecker should not traverse entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8399) Erasure Coding: unit test the behaviour of BlockManager recovery work for the deleted blocks
[ https://issues.apache.org/jira/browse/HDFS-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649968#comment-14649968 ] Zhe Zhang commented on HDFS-8399: - Thanks for the work Rakesh! The added test looks a clean sanity check. Can we either add it to {{TestStripedINodeFile}} (preferably) or change to a more intuitive name? Other than that LGTM. Erasure Coding: unit test the behaviour of BlockManager recovery work for the deleted blocks Key: HDFS-8399 URL: https://issues.apache.org/jira/browse/HDFS-8399 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Labels: Test Attachments: HDFS-8399-HDFS-7285-00.patch, HDFS-8399-HDFS-7285-01.patch Following exception occurred in the {{ReplicationMonitor}}. As per the initial analysis, I could see the exception is coming for the blocks of the deleted file. {code} 2015-05-14 14:14:40,485 FATAL util.ExitUtil (ExitUtil.java:terminate(127)) - Terminate called org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: Absolute path required at org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846) at java.lang.Thread.run(Thread.java:722) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865) at java.lang.Thread.run(Thread.java:722) Exception in thread org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@1255079 org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: Absolute path required at org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846) at java.lang.Thread.run(Thread.java:722) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865) at java.lang.Thread.run(Thread.java:722) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8486) DN startup may cause severe data loss
[ https://issues.apache.org/jira/browse/HDFS-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-8486: --- Labels: 2.6.1-candidate (was: ) DN startup may cause severe data loss - Key: HDFS-8486 URL: https://issues.apache.org/jira/browse/HDFS-8486 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 0.23.1, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-8486.patch, HDFS-8486.patch A race condition between block pool initialization and the directory scanner may cause a mass deletion of blocks in multiple storages. If block pool initialization finds a block on disk that is already in the replica map, it deletes one of the blocks based on size, GS, etc. Unfortunately it _always_ deletes one of the blocks even if identical, thus the replica map _must_ be empty when the pool is initialized. The directory scanner starts at a random time within its periodic interval (default 6h). If the scanner starts very early it races to populate the replica map, causing the block pool init to erroneously delete blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy
[ https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-6860: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) commit to 2.8.0 BlockStateChange logs are too noisy --- Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Chang Li Labels: BB2015-05-TBR, newbie Fix For: 2.8.0 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, HDFS6860.patch Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation
[ https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650017#comment-14650017 ] Tsz Wo Nicholas Sze commented on HDFS-8804: --- Some comments on the patch: - Should getParityBuffer() be synchronized? It seems that some code path from pread is not synchronized. - close() should check whether curStripeBuf == null since close() can be called multiple times. Some other suggestions can be implemented later: * It is better to have multiple small data/parity buffers with size == cellSize so that it is more efficient for reusing the buffers. * Should DirectBufferPool be singleton? So that the pool can be shared. Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation --- Key: HDFS-8804 URL: https://issues.apache.org/jira/browse/HDFS-8804 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8804.000.patch Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for the stripe buffer and the buffers holding parity data. It's better to get ByteBuffer from DirectBufferPool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
[ https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8838: -- Attachment: h8838_20150731.patch Tolerate datanode failures in DFSStripedOutputStream when the data length is small -- Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8838_20150729.patch, h8838_20150731.patch Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
[ https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649875#comment-14649875 ] Zhe Zhang commented on HDFS-8480: - Thanks for the discussion Ming and Colin! I created HDFS-8846 to add old-version edit logs for testing. Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them - Key: HDFS-8480 URL: https://issues.apache.org/jira/browse/HDFS-8480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, HDFS-8480.02.patch, HDFS-8480.03.patch HDFS-7929 copies existing edit logs to the storage directory of the upgraded {{NameNode}}. This slows down the upgrade process. This JIRA aims to use hard-linking instead of per-op copying to achieve the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)