[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885240#comment-13885240 ] Hudson commented on HDFS-5844: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #465 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/465/]) HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Fix broken link in WebHDFS.apt.vm - Key: HDFS-5844 URL: https://issues.apache.org/jira/browse/HDFS-5844 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5844.patch There is one broken link in WebHDFS.apt.vm. {code} {{{RemoteException JSON Schema}}} {code} should be {code} {{RemoteException JSON Schema}} {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
[ https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5702 started by Vinay. FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands --- Key: HDFS-5702 URL: https://issues.apache.org/jira/browse/HDFS-5702 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Vinay Assignee: Vinay Attachments: HDFS-5702.patch, HDFS-5702.patch FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
[ https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5702: Attachment: HDFS-5702.patch Added mentioned 4 tests. Please review. I couldn't avoid the long line in expected message as its necessary to compare the exact output. FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands --- Key: HDFS-5702 URL: https://issues.apache.org/jira/browse/HDFS-5702 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Vinay Assignee: Vinay Attachments: HDFS-5702.patch, HDFS-5702.patch FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885317#comment-13885317 ] Tsz Wo (Nicholas), SZE commented on HDFS-5754: -- - In DataStorage, BPServiceActor and BlockPoolSliceStorage, it should not compare DATANODE_LAYOUT_VERSION with nsInfo.getLayoutVersion() anymore. - MapInteger, TreeSetLayoutFeature should be MapInteger, SetLayoutFeature. We should declear with interface Set (or should we use SortedSet?) instead of particular implementation TreeSet. - In PBHelper, could we use null (i.e. unknown) instead of NodeType.NAME_NODE as default? Or we could add a setStorageType(NodeType) method so that we could set it when it is null. - The type parameter below is not used. Should it be removed? {code} //Storage.java protected Storage(NodeType type, StorageInfo storageInfo) { super(storageInfo); -this.storageType = type; } {code} - I suggest to move the layout version related code out from NameNode and DataNode to new classes, say NameNodeLayoutVersion and DataNodeLayoutVersion. Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion Key: HDFS-5754 URL: https://issues.apache.org/jira/browse/HDFS-5754 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Brandon Li Attachments: FeatureInfo.patch, HDFS-5754.001.patch, HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, HDFS-5754.009.patch, HDFS-5754.010.patch Currently, LayoutVersion defines the on-disk data format and supported features of the entire cluster including NN and DNs. LayoutVersion is persisted in both NN and DNs. When a NN/DN starts up, it checks its supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a different LayoutVersion than NN cannot register with the NN. We propose to split LayoutVersion into two independent values that are local to the nodes: - NamenodeLayoutVersion - defines the on-disk data format in NN, including the format of FSImage, editlog and the directory structure. - DatanodeLayoutVersion - defines the on-disk data format in DN, including the format of block data file, metadata file, block pool layout, and the directory structure. The LayoutVersion check will be removed in DN registration. If NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885331#comment-13885331 ] Hudson commented on HDFS-5844: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1682 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1682/]) HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Fix broken link in WebHDFS.apt.vm - Key: HDFS-5844 URL: https://issues.apache.org/jira/browse/HDFS-5844 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5844.patch There is one broken link in WebHDFS.apt.vm. {code} {{{RemoteException JSON Schema}}} {code} should be {code} {{RemoteException JSON Schema}} {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885336#comment-13885336 ] Hudson commented on HDFS-5844: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1657 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1657/]) HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Fix broken link in WebHDFS.apt.vm - Key: HDFS-5844 URL: https://issues.apache.org/jira/browse/HDFS-5844 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5844.patch There is one broken link in WebHDFS.apt.vm. {code} {{{RemoteException JSON Schema}}} {code} should be {code} {{RemoteException JSON Schema}} {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5585: - Assignee: Kihwal Lee Status: Patch Available (was: Open) Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5585: - Attachment: HDFS-5585.patch Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Attachments: HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885338#comment-13885338 ] Hadoop QA commented on HDFS-5585: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625858/HDFS-5585.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5976//console This message is automatically generated. Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-5776: Attachment: HDFS-5776-v11.txt Attached v11: 1) modify isHedgedReadsEnabled() to consider pool size as well 2) modify setThreadsNumForHedgedReads to private so can not change the thread number from client side dynamically, and remove synchronized also. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885512#comment-13885512 ] Hadoop QA commented on HDFS-5776: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625869/HDFS-5776-v11.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5977//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5977//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5977//console This message is automatically generated. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5492) Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk
[ https://issues.apache.org/jira/browse/HDFS-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885557#comment-13885557 ] Arpit Agarwal commented on HDFS-5492: - Thanks for cleaning up the doc, needs one fix. {code} + small portions (4 KB, configurable), writes each portion to its local {code} The default packet size is 64KB. We can just avoid mentioning the exact size. Thanks, Arpit. Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk -- Key: HDFS-5492 URL: https://issues.apache.org/jira/browse/HDFS-5492 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: documentation, newbie Attachments: HDFS-5492.patch, HDFS-5492.patch HDFS-2069 is not ported to current document. The description of HDFS-2069 is as follows: {quote} Current HDFS architecture information about Trash is incorrectly documented as - The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface. It should be something like - Current default trash interval is set to 0 (Deletes file without storing in trash ) . This value is configurable parameter stored as fs.trash.interval stored in core-site.xml . {quote} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885590#comment-13885590 ] Vinay commented on HDFS-5585: - changes looks good kihwal. some minor suggestions you might want to add \n at the end of these lines for better looking, bq. +String shutdownDatanode = -shutdownDatanode datanode_host:ipc_port \[upgrade\] + bq. +String pingDatanode = -pingDatanode datanode_host:ipc_port + Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5586) Add quick-restart option for datanode
[ https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885593#comment-13885593 ] Vinay commented on HDFS-5586: - I think this is being covered in HDFS-5585. Can we make it duplicate.? Add quick-restart option for datanode - Key: HDFS-5586 URL: https://issues.apache.org/jira/browse/HDFS-5586 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee This feature, combined with the graceful shutdown feature, will enable data nodes to come back up and start serving quickly. This is likely a command line option for data node, which triggers it to look for saved state information in its local storage. If the information is present and reasonably up-to-date, data node would skip some of the startup steps. Ideally it should be able to do quick registration without requiring removal of all blocks from the date node descriptor on the name node and reconstructing it with the initial full block report. This implies that all RBW blocks are recorded during shutdown and on start-up they are not turned into RWR. Other than the quick registration, name node should treat the restart as if few heart beats were lost from the node. There should be no unexpected replica state changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885626#comment-13885626 ] Jing Zhao commented on HDFS-5842: - The failed test has been reported in HDFS-5718 and should be unrelated. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5776: Attachment: HDFS-5776-v12.txt Address the findbugs warning. [~jingzhao] Does this patch address your concerns? (Thanks for the review) Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
Nikola Vujic created HDFS-5846: -- Summary: Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Reporter: Nikola Vujic Assignee: Nikola Vujic Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5847) Consolidate INodeReference into a separate section
Haohui Mai created HDFS-5847: Summary: Consolidate INodeReference into a separate section Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Jing Zhao Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885661#comment-13885661 ] Jitendra Nath Pandey commented on HDFS-5842: checkTGTAndReloginFromKeytab is removed, it will cause issues once TGT expires. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885668#comment-13885668 ] Jing Zhao commented on HDFS-5842: - Thanks for the review, Jitendra. So checkTGTAndReloginFromKeytab is always called in URLConnectionFactory#openConnection, which is called by getDT/renewDT/cancelDT. Thus I think we do not need to call checkTGTAndReloginFromKeytab multiple times here. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885681#comment-13885681 ] Kihwal Lee commented on HDFS-5585: -- Sorry my bad. I thought I fixed all missing newlines while testing. I will revise the patch soon. Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5771) Track progress when loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5771: - Attachment: HDFS-5771.002.patch Thanks Chris for the review. The v2 patch makes sure that {{beginStep()}} and {{endStep()}} are called exactly once for each step. It also records the storage path in the step. Track progress when loading fsimage --- Key: HDFS-5771 URL: https://issues.apache.org/jira/browse/HDFS-5771 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, HDFS-5771.002.patch The old code that loads the fsimage tracks the progress during loading. This jira proposes to implement the same functionality in the new code which serializes the fsimage using protobuf.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5585: - Attachment: HDFS-5585.patch Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch, HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885685#comment-13885685 ] Jitendra Nath Pandey commented on HDFS-5842: bq. URLConnectionFactory#openConnection, which is called by getDT/renewDT/cancelDT. Thus I think we do not need to call checkTGTAndReloginFromKeytab multiple times here. Okay, sounds good. +1 for the patch. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885693#comment-13885693 ] Hadoop QA commented on HDFS-5585: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625935/HDFS-5585.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5980//console This message is automatically generated. Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch, HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5771) Track progress when loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5771: - Attachment: HDFS-5771.003.patch The v3 patch places the {{currentStep}} variable correctly. Track progress when loading fsimage --- Key: HDFS-5771 URL: https://issues.apache.org/jira/browse/HDFS-5771 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, HDFS-5771.002.patch, HDFS-5771.003.patch The old code that loads the fsimage tracks the progress during loading. This jira proposes to implement the same functionality in the new code which serializes the fsimage using protobuf.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.
[ https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-5796: - Target Version/s: 2.4.0 (was: ) The file system browser in the namenode UI requires SPNEGO. --- Key: HDFS-5796 URL: https://issues.apache.org/jira/browse/HDFS-5796 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee Priority: Critical After HDFS-5382, the browser makes webhdfs REST calls directly, requiring SPNEGO to work between user's browser and namenode. This won't work if the cluster's security infrastructure is isolated from the regular network. Moreover, SPNEGO is not supposed to be required for user-facing web pages. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()
[ https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-5356: - Target Version/s: 2.4.0 (was: ) MiniDFSCluster shoud close all open FileSystems when shutdown() --- Key: HDFS-5356 URL: https://issues.apache.org/jira/browse/HDFS-5356 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.2.0 Reporter: haosdent Priority: Critical Attachments: HDFS-5356.patch After add some metrics functions to DFSClient, I found that some unit tests relates to metrics are failed. Because MiniDFSCluster are never close open FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The metrics of DFSClients in DefaultMetricsSystem are still exist and this make other unit tests failed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5500) Critical datanode threads may terminate silently on uncaught exceptions
[ https://issues.apache.org/jira/browse/HDFS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-5500: - Target Version/s: 2.4.0 (was: ) Critical datanode threads may terminate silently on uncaught exceptions --- Key: HDFS-5500 URL: https://issues.apache.org/jira/browse/HDFS-5500 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This can go unnoticed for a long time. If OOM occurs, more things can go wrong. On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had terminated. DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I am not sure it is really helpful. In once case, the thread did it multiple times then terminated. I suspect another OOM was thrown while in a catch block. As a result, the server socket was not closed and clients hung on connect. If it had at least closed the socket, client-side would have been impacted less. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-5138: - Target Version/s: 2.4.0 (was: ) Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5293) Symlink resolution requires unnecessary RPCs
[ https://issues.apache.org/jira/browse/HDFS-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-5293: - Target Version/s: 3.0.0, 2.4.0 (was: 3.0.0) Symlink resolution requires unnecessary RPCs Key: HDFS-5293 URL: https://issues.apache.org/jira/browse/HDFS-5293 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Critical When the NN encounters a symlink, it throws an {{UnresolvedLinkException}}. This exception contains only the path that is a symlink. The client issues another RPC to obtain the link target, followed by another RPC with the link target + remainder of the original path. {{UnresolvedLinkException}} should be returning both the link and the target to avoid a costly and unnecessary intermediate RPC to obtain the link target. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-782) dynamic replication
[ https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885774#comment-13885774 ] Jordan Mendelson commented on HDFS-782: --- Could this not be implemented in response to a client reading a remote block? The client will already be copying the block across the network in order to operate on it. A replication storm shouldn't happen unnecessarily in this case since it isn't proactively copying. Since the client is reading the remote block, we can be reasonably sure that the block could use an extra replica. This could also speed up the case of replicating a recently written block since we can reuse the data that has just be copied (even if it is a sub-optimal location for the block, it would at least increase data availability until it can be replicated properly). Deletion of over-replicated blocks could be happen when free space becomes low. The downside seems to be the potential for extra disk writes. If every remote read of a complete block leads to storage of that block on the machine doing the read, we could end up writing a lot of data. Though it seems like this could be somewhat mitigated with some sort of upper-replica limit. dynamic replication --- Key: HDFS-782 URL: https://issues.apache.org/jira/browse/HDFS-782 Project: Hadoop HDFS Issue Type: New Feature Reporter: Ning Zhang In a large and busy cluster, a block can be requested by many clients at the same time. HDFS-767 tries to solve the failing case when the # of retries exceeds the maximum # of retries. However, that patch doesn't solve the performance issue since all failing clients have to wait a certain period before retry, and the # of retries could be high. One solution to solve the performance issue is to increase the # of replicas for this hot block dynamically when it is requested many times at a short period. The name node need to be aware such situation and only clean up extra replicas when they are not accessed recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
Tsz Wo (Nicholas), SZE created HDFS-5848: Summary: Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5848: - Attachment: h5848_20130130.patch h5848_20130130.patch: adds RollingUpgradeCommand. Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress - Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled
[ https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885793#comment-13885793 ] Hadoop QA commented on HDFS-5843: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625529/hdfs-5843.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5978//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5978//console This message is automatically generated. DFSClient.getFileChecksum() throws IOException if checksum is disabled -- Key: HDFS-5843 URL: https://issues.apache.org/jira/browse/HDFS-5843 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Laurent Goujon Attachments: hdfs-5843.patch If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} for example), calling {{FileSystem.getFileChecksum()}} throws the following IOException: {noformat} java.io.IOException: Fail to get block MD5 for BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001 at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965) at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194) [...] {noformat} From the logs, the datanode is doing some wrong arithmetics because of the crcPerBlock: {noformat} 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation src: /127.0.0.1:52407 dest: /127.0.0.1:52398 java.lang.ArithmeticException: / by zero at org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:695) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5849) Removing ACL from an inode fails if it has only a default ACL.
Chris Nauroth created HDFS-5849: --- Summary: Removing ACL from an inode fails if it has only a default ACL. Key: HDFS-5849 URL: https://issues.apache.org/jira/browse/HDFS-5849 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth When removing an ACL, the logic must restore the group permission previously stored in an ACL entry back into the group permission bits. The logic for this in {{AclTransformation#removeINodeAcl}} assumes that the group entry must be found in the former ACL. This is not the case when removing the ACL from an inode that only had a default ACL and not an access ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Attachment: HDFS-5746.005.patch * rename 'anchor' and 'unanchor' to 'addAnchor' and 'removeAnchor' * add a stress test for DomainSocketWatcher, which also includes some remove operations. * make some messages TRACE that were formerly INFO. we don't want info logs when handling every event * fix a bug in the native code where we were reallocating the fd_set_data structure, but writing the new length to the old structure * put addNotificationSocket in a new function to avoid cluttering the main loop. Remember to increment the reference count on the notificationSocket so that we don't get logs about mismatched reference counts when shutting down the watcher add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885808#comment-13885808 ] Colin Patrick McCabe commented on HDFS-5399: It seems that after HDFS-5291, the client tries to fail over to the other namenode after getting a SafeMode exception. This seems wrong, since it means that the client will keep retrying forever (and hang) until the namespace comes out of safemode. Formerly, we did not retry safe mode exceptions, whether or not we were in HA mode. This is the correct behavior, right? Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885813#comment-13885813 ] Hadoop QA commented on HDFS-5776: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625915/HDFS-5776-v12.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestAuditLogs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5979//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5979//console This message is automatically generated. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional
Kihwal Lee created HDFS-5850: Summary: DNS Issues during TrashEmptier initialization can silently leave it non-functional Key: HDFS-5850 URL: https://issues.apache.org/jira/browse/HDFS-5850 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical [~knoguchi] once noticed that the trash directories of a restarted cluster are not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885837#comment-13885837 ] Jing Zhao commented on HDFS-5399: - The client will not fail over. It will retry the same NN (and this NN throws RetriableException only when it's in active state). But I think we may want to add a maximum retry times there. bq. Formerly, we did not retry safe mode exceptions, whether or not we were in HA mode. The issue with HA setup is that the SBN may stay in safemode for a long time and when it transitions to the active state, it needs at least 30s to come out of the safemode. This makes the actual failover time long since the old behavior is that the client will retry only once. This can then cause HBase region server to timeout and kill itself. Thus we need to let client wait and retry longer time. But in the meanwhile, I think we should revisit this safemode extension and see if we can avoid NN to go to unnecessary safemode and shorten the safemode period. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885838#comment-13885838 ] Colin Patrick McCabe commented on HDFS-5841: +1 pending jenkins Update HDFS caching documentation with new changes -- Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Labels: caching Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional
[ https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5850: - Description: [~knoguchi] recently noticed that the trash directories of a restarted cluster are not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. was: [~knoguchi] once noticed that the trash directories of a restarted cluster are not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. DNS Issues during TrashEmptier initialization can silently leave it non-functional -- Key: HDFS-5850 URL: https://issues.apache.org/jira/browse/HDFS-5850 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical [~knoguchi] recently noticed that the trash directories of a restarted cluster are not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional
[ https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5850: - Description: [~knoguchi] recently noticed that the trash directories of a restarted cluster were not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. was: [~knoguchi] recently noticed that the trash directories of a restarted cluster are not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. DNS Issues during TrashEmptier initialization can silently leave it non-functional -- Key: HDFS-5850 URL: https://issues.apache.org/jira/browse/HDFS-5850 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical [~knoguchi] recently noticed that the trash directories of a restarted cluster were not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885875#comment-13885875 ] Suresh Srinivas commented on HDFS-5848: --- Why is this a command and not a state that is always sent to DataNode? Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress - Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885893#comment-13885893 ] Hudson commented on HDFS-5842: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5061 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5061/]) HDFS-5842. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562603) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5842: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Jitendra! I've committed this to trunk and branch-2. Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885898#comment-13885898 ] Aaron T. Myers commented on HDFS-5399: -- bq. The issue with HA setup is that the SBN may stay in safemode for a long time and when it transitions to the active state, it needs at least 30s to come out of the safemode. I don't follow this. Why is the SBN staying in safemode for a long time in an HA setup? Being in safemode and being in either the active or standby states should be orthogonal. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5776: Attachment: HDFS-5776-v12.txt Failure seems unrelated. Let me try again to be sure. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5771) Track progress when loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5771: Component/s: namenode Hadoop Flags: Reviewed +1 for the v3 patch. Thanks for incorporating those changes, Haohui. Track progress when loading fsimage --- Key: HDFS-5771 URL: https://issues.apache.org/jira/browse/HDFS-5771 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, HDFS-5771.002.patch, HDFS-5771.003.patch The old code that loads the fsimage tracks the progress during loading. This jira proposes to implement the same functionality in the new code which serializes the fsimage using protobuf.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885916#comment-13885916 ] Arpit Gupta commented on HDFS-5399: --- We had run into this issue while testing HA. You can see in HDFS-5291 that the standby NN after transitioning to active went into safemode. We saw issues where Resource Manager and Region Servers would crash/complain because of this. We ran into this frequently before HDFS-5291 was fixed. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885930#comment-13885930 ] Aaron T. Myers commented on HDFS-5399: -- On that JIRA I asked the following question: bq. Is my understanding of this issue correct that the only thing we're trying to fix here is the fact the clients are not retrying attempting to talk to the active NN when it receives a safemode exception? i.e. it's not the case that the standby NN is somehow incorrectly going into safemode after a failover? I concluded (perhaps incorrectly) based on Jing's response that I was correct in my understanding of the issue, but it seems that I was not. If so, the fact that the former standby NN is going into safemode upon transition to active is the real bug here, not that clients don't retry when the NN is in safemode, and that's what we should be fixing, not the client RPC retry behavior. Jing/Arpit - do either of you have any insight as to why you observed the NN going into safemode upon transition to active? If we can figure that out, then we should fix that, and perhaps revert or modify the new behavior introduced in HDFS-5291. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5771) Track progress when loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5771. - Resolution: Fixed Fix Version/s: HDFS-5698 (FSImage in protobuf) I committed the patch to the HDFS-5698 feature branch. Thanks again, Haohui. Track progress when loading fsimage --- Key: HDFS-5771 URL: https://issues.apache.org/jira/browse/HDFS-5771 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS-5698 (FSImage in protobuf) Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, HDFS-5771.002.patch, HDFS-5771.003.patch The old code that loads the fsimage tracks the progress during loading. This jira proposes to implement the same functionality in the new code which serializes the fsimage using protobuf.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885944#comment-13885944 ] Brandon Li commented on HDFS-5754: -- {quote} In DataStorage, BPServiceActor and BlockPoolSliceStorage, it should not compare DATANODE_LAYOUT_VERSION with nsInfo.getLayoutVersion() anymore.{quote} removed. {quote} MapInteger, TreeSetLayoutFeature should be MapInteger, SetLayoutFeature. We should declear with interface Set (or should we use SortedSet?) instead of particular implementation TreeSet.{quote} yes. {quote} In PBHelper, could we use null (i.e. unknown) instead of NodeType.NAME_NODE as default? Or we could add a setStorageType(NodeType) method so that we could set it when it is null. {quote} If we use null as default and add new method setStorageType() to set storageType in a few places after receiving StorageInfo from the wire, the code is not as clean as just sending StorageType in the RPC payload. But I will upload a patch with the default null first to show the change. {quote}The type parameter below is not used. Should it be removed?{quote} yes. {quote} I suggest to move the layout version related code out from NameNode and DataNode to new classes, say NameNodeLayoutVersion and DataNodeLayoutVersion. {quote} Agree. It's better to hide the maps in these two classes than exposing them everywhere. Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion Key: HDFS-5754 URL: https://issues.apache.org/jira/browse/HDFS-5754 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Brandon Li Attachments: FeatureInfo.patch, HDFS-5754.001.patch, HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, HDFS-5754.009.patch, HDFS-5754.010.patch Currently, LayoutVersion defines the on-disk data format and supported features of the entire cluster including NN and DNs. LayoutVersion is persisted in both NN and DNs. When a NN/DN starts up, it checks its supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a different LayoutVersion than NN cannot register with the NN. We propose to split LayoutVersion into two independent values that are local to the nodes: - NamenodeLayoutVersion - defines the on-disk data format in NN, including the format of FSImage, editlog and the directory structure. - DatanodeLayoutVersion - defines the on-disk data format in DN, including the format of block data file, metadata file, block pool layout, and the directory structure. The LayoutVersion check will be removed in DN registration. If NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5781: -- Fix Version/s: (was: 2.3.0) 2.4.0 JIRA fix versions are weird right now, I think this is only in branch-2 and not also branch-2.3. I think this is minor enough that it's okay to leave it out, but please merge it to branch-2.3 and update the fix version if you feel otherwise. Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value --- Key: HDFS-5781 URL: https://issues.apache.org/jira/browse/HDFS-5781 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, HDFS-5781.002.patch, HDFS-5781.002.patch HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a given byte value. While improving the efficiency, it may cause issue. E.g., when several new editlog ops are added to trunk around the same time (for several different new features), it is hard to backport the editlog ops with larger byte values to branch-2 before those with smaller values, since there will be gaps in the byte values of the enum. This jira plans to still use an array to record the mapping between editlog ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5688) Wire-encription in QJM
[ https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885962#comment-13885962 ] Suresh Srinivas commented on HDFS-5688: --- [~jucaf], please provide the information required for verifying if this is indeed a bug. I will close this jira after a week or so, if information required is not posted to the jira. Wire-encription in QJM -- Key: HDFS-5688 URL: https://issues.apache.org/jira/browse/HDFS-5688 Project: Hadoop HDFS Issue Type: Bug Components: ha, journal-node, security Affects Versions: 2.2.0 Reporter: Juan Carlos Fernandez Priority: Blocker Labels: security When HA is implemented with QJM and using kerberos, it's not possible to set wire-encrypted data. If it's set property hadoop.rpc.protection to something different to authentication it doesn't work propertly, getting the error: ERROR security.UserGroupInformation: PriviledgedActionException as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: No common protection layer between client and server With NFS as shared storage everything works like a charm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5614) NameNode: implement handling of ACLs in combination with snapshots.
[ https://issues.apache.org/jira/browse/HDFS-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5614: Attachment: HDFS-5614.1.patch I'm uploading the patch. Here is a summary of the changes. # {{DFSClient}}: Testing revealed that we weren't unwrapping {{NSQuotaExceededException}} in the ACL modification APIs. This exception can be thrown when changing an ACL on a file that is a child of a directory that was previously snapshotted, because the change requires consuming more namespace quota. # {{AclFeature}}: I made instances of this class immutable. This fixed a lot of bugs related to copying the instance around inside a snapshot and then mutating the original through the ACL modification APIs. # {{AclStorage}}: Calls to inode methods for getting and setting the ACL now pass snapshot ID. # {{FSDirectory}}: Added special case handling for .snapshot path and changed {{getAclStatus}} to get the correct snapshot ID. # {{FSImageFormat}}/{{FSImageSerialization}}: The current ACL is now written in snapshot diff lists and restored into the {{SnapshotCopy}} on load. # {{INode}} and subclasses and related interfaces: Previously, we had the methods for getting and setting the {{AclFeature}} in {{INodeWithAdditionalFields}}. I've now made the necessary changes throughout the inode class hierarchy to define these methods in the {{INode}} base class and return the correct results in subclasses. # {{INodeDirectory}}: I added a special case in the copy constructor to preserve the ACL even if we aren't copying the other inode features. # {{TestNameNodeAcl}}: New test suite covering the various interactions between ACLs and snapshots. # I cleaned up multiple places in the code in {{FSDirectory}}, {{FSPermissionChecker}} and {{AclStorage}} that previously had been downcasting to {{INodeWithAdditionalFields}}. I've also verified that other ACL tests in the branch are still passing with this patch. NameNode: implement handling of ACLs in combination with snapshots. --- Key: HDFS-5614 URL: https://issues.apache.org/jira/browse/HDFS-5614 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5614.1.patch Within a snapshot, all ACLs are frozen at the moment that the snapshot was created. ACL changes in the parent of the snapshot are not applied to the snapshot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5810: --- Attachment: HDFS-5810.006.patch * ShortCircuitCache#fetchOrCreate: retry here if we get a stale replica. * ShortCircuitCache#obliterate: must set refCount to 0 here. * fix up some logs, add more trace logs * fix findbugs issues * add more descriptive failure message to some asserts * TestBlockTokenWithDFS: fix test control flow. fix longstanding DFSClient leak. * move getConfiguration and getUGI out of the RemotePeerFactory interface. Unify mmap cache and short-circuit file descriptor cache Key: HDFS-5810 URL: https://issues.apache.org/jira/browse/HDFS-5810 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, HDFS-5810.006.patch We should unify the client mmap cache and the client file descriptor cache. Since mmaps are granted corresponding to file descriptors in the cache (currently FileInputStreamCache), they have to be tracked together to do smarter things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional
[ https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885974#comment-13885974 ] Daryn Sharp commented on HDFS-5850: --- I'm not sure this issue affects 2.x. In 0.23, the client pre-constructs the kerberos service principal and caches it in the ConnectionId. All subsequent connections use a cached Connection which in turn reuses the cached principal in the ConnectionId. Thus, if the principal is misconstructed it will never recover. RPCv9 in 2.x should recover. The client no longer preconstructs and caches the principal. It verifies the principal advertised by the server. If a transient DNS resolve failure occurs, the _HOST substitution in the service principal key will indeed yield a principal with an IP. The client will reject the advertised principal because it doesn't match (ip vs hostname). However, subsequent connections will attempt to reverify the advertised principal which involves a new DNS resolve. The client should recover when DNS recovers. DNS Issues during TrashEmptier initialization can silently leave it non-functional -- Key: HDFS-5850 URL: https://issues.apache.org/jira/browse/HDFS-5850 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical [~knoguchi] recently noticed that the trash directories of a restarted cluster were not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional
[ https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-5850: -- Affects Version/s: (was: 2.4.0) 0.23.0 DNS Issues during TrashEmptier initialization can silently leave it non-functional -- Key: HDFS-5850 URL: https://issues.apache.org/jira/browse/HDFS-5850 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Kihwal Lee Priority: Critical [~knoguchi] recently noticed that the trash directories of a restarted cluster were not cleaned up. It turned out that it was caused by a transient DNS problem during initialization. TrashEmptier thread in namenode is actually a FileSystem client running in a loop, which makes RPC calls to itself in order to list, rename and delete trash files. In a secure setup, the client needs to create the right service principal name for the namenode for making a RPC connection. If there is a DNS issue at that moment, the SPN ends up with the IP address, not the fqdn. Since KDC does not recognize this SPN, TrashEmptier does not work from that point on. I verified that the SPN with the IP address was what the TrashEmptier thread asked KDC for a service ticket for. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885986#comment-13885986 ] Andrew Wang commented on HDFS-5842: --- Should this be included in branch-2.3 as well? Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885990#comment-13885990 ] Tsz Wo (Nicholas), SZE commented on HDFS-5848: -- Heartbeat response is called DatanodeCommand in the code. It will keep sending RollingUpgradeCommand for every heartbeat during rolling upgrade. Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress - Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885998#comment-13885998 ] Hadoop QA commented on HDFS-5810: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626008/HDFS-5810.006.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5985//console This message is automatically generated. Unify mmap cache and short-circuit file descriptor cache Key: HDFS-5810 URL: https://issues.apache.org/jira/browse/HDFS-5810 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, HDFS-5810.006.patch We should unify the client mmap cache and the client file descriptor cache. Since mmaps are granted corresponding to file descriptors in the cache (currently FileInputStreamCache), they have to be tracked together to do smarter things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886011#comment-13886011 ] Andrew Wang commented on HDFS-5845: --- I'll also note that I bumped the timeout on that seemingly unrelated test since it flaked twice for me at 30s. SecondaryNameNode dies when checkpointing with cache pools -- Key: HDFS-5845 URL: https://issues.apache.org/jira/browse/HDFS-5845 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Labels: caching Attachments: hdfs-5845-1.patch The SecondaryNameNode clears and reloads its FSNamesystem when doing checkpointing. However, FSNamesystem#clear does not clear CacheManager state during this reload. This leads to an error like the following: {noformat} org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886012#comment-13886012 ] Hadoop QA commented on HDFS-5746: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625965/HDFS-5746.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1546 javac compiler warnings (more than the trunk's current 1541 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPersistBlocks The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.net.unix.TestDomainSocketWatcher {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5981//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5981//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5981//console This message is automatically generated. add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886014#comment-13886014 ] Hadoop QA commented on HDFS-4239: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625766/hdfs-4239_v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestSmallBlock org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.TestFileLimit org.apache.hadoop.hdfs.server.balancer.TestBalancer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//console This message is automatically generated. Means of telling the datanode to stop using a sick disk --- Key: HDFS-4239 URL: https://issues.apache.org/jira/browse/HDFS-4239 Project: Hadoop HDFS Issue Type: Improvement Reporter: stack Assignee: Jimmy Xiang Attachments: hdfs-4239.patch, hdfs-4239_v2.patch If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally, or just exhibiting high latency -- your choices are: 1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster. 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data. Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while. Nice to have would be being able to tell the datanode to restart using a disk after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886017#comment-13886017 ] Jing Zhao commented on HDFS-5842: - Yeah, that will be great. Thanks Andrew! Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5851) Support memory as a storage medium
Arpit Agarwal created HDFS-5851: --- Summary: Support memory as a storage medium Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886027#comment-13886027 ] Hadoop QA commented on HDFS-5841: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625670/hdfs-5841-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5982//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5982//console This message is automatically generated. Update HDFS caching documentation with new changes -- Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Labels: caching Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886031#comment-13886031 ] Colin Patrick McCabe commented on HDFS-5845: Looks good to me. +1 SecondaryNameNode dies when checkpointing with cache pools -- Key: HDFS-5845 URL: https://issues.apache.org/jira/browse/HDFS-5845 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Labels: caching Attachments: hdfs-5845-1.patch The SecondaryNameNode clears and reloads its FSNamesystem when doing checkpointing. However, FSNamesystem#clear does not clear CacheManager state during this reload. This leads to an error like the following: {noformat} org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5845: -- Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) Thanks Colin, I committed this to branch-2.3, branch-2, and trunk. SecondaryNameNode dies when checkpointing with cache pools -- Key: HDFS-5845 URL: https://issues.apache.org/jira/browse/HDFS-5845 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Labels: caching Fix For: 2.3.0 Attachments: hdfs-5845-1.patch The SecondaryNameNode clears and reloads its FSNamesystem when doing checkpointing. However, FSNamesystem#clear does not clear CacheManager state during this reload. This leads to an error like the following: {noformat} org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886040#comment-13886040 ] Jing Zhao commented on HDFS-5399: - bq. If so, the fact that the former standby NN is going into safemode upon transition to active is the real bug here It's not like this. SBN will not put itself into safemode because of transitioning to active state. What we saw in our test is: the SBN cannot come out of the safemode thus the safemode object is not null when failover happens. And when the SBN becomes active, it can quickly go into the safemode extension period, but this still adds an extra 30 seconds to the no-service time. Thus the question is, why the NN can quickly go into the safemode extension period while in active state, but keeps staying in safemode in standby state? In our test we have a lot of file creation/deletion happening. Is it possible that the SBN keeps tailing the editlog while hold the FSN lock, thus the SafeModeMonitor thread could not get the lock to leave the safemode? Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write
[ https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886051#comment-13886051 ] sukhendu chakraborty commented on HDFS-198: --- I am seeing the lease not expired error for a partitioned hive tables in CDH 4.5 MR1. I have a similar usecase as Sujesh above, I am using dynamic date partitioning for a year (365 partitions), but have 1B rows (300GB of data for that year). I also want to cluster the data in each partition into 32 buckets. Here is part of the error trace: 3:58:18.531 PM ERROR org.apache.hadoop.hdfs.DFSClient Failed to close file /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-1/trn_dt=20090531/_tmp.12_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-1/trn_dt=20090531/_tmp.12_0: File does not exist. Holder DFSClient_NONMAPREDUCE_-1745484980_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2601) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2578) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:556) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:337) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44958) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746) at org.apache.hadoop.ipc.Client.call(Client.java:1238) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy10.complete(Unknown Source) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy10.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:330) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1796) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1783) at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:709) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:726) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:561) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2399) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2415) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) org.apache.hadoop.dfs.LeaseExpiredException during dfs write Key: HDFS-198 URL: https://issues.apache.org/jira/browse/HDFS-198 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Reporter: Runping Qi Many long running cpu intensive map tasks failed due to org.apache.hadoop.dfs.LeaseExpiredException. See [a comment below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298] for the exceptions from the log: -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886053#comment-13886053 ] Aaron T. Myers commented on HDFS-5399: -- I see, so it sounds like the bug is that the NN is not leaving safemode (after startup?) automatically while it's in the standby state even though it's received sufficient block reports to cause it to leave safemode. It will then automatically enter the extension period and subsequently leave safemode only on transition to the active state. Is that correct? bq. Is it possible that the SBN keeps tailing the editlog while hold the FSN lock, thus the SafeModeMonitor thread could not get the lock to leave the safemode? I don't think this is possible. The EditLogTailer only takes the FSN lock when it wakes up periodically to tail edits. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886049#comment-13886049 ] Hudson commented on HDFS-5845: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5063 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5063/]) HDFS-5845. SecondaryNameNode dies when checkpointing with cache pools. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562644) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java SecondaryNameNode dies when checkpointing with cache pools -- Key: HDFS-5845 URL: https://issues.apache.org/jira/browse/HDFS-5845 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Labels: caching Fix For: 2.3.0 Attachments: hdfs-5845-1.patch The SecondaryNameNode clears and reloads its FSNamesystem when doing checkpointing. However, FSNamesystem#clear does not clear CacheManager state during this reload. This leads to an error like the following: {noformat} org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
[ https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5702: Target Version/s: HDFS ACLs (HDFS-4685) Affects Version/s: HDFS ACLs (HDFS-4685) Hadoop Flags: Reviewed +1 for the patch. I'll commit this later today. FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands --- Key: HDFS-5702 URL: https://issues.apache.org/jira/browse/HDFS-5702 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Vinay Assignee: Vinay Attachments: HDFS-5702.patch, HDFS-5702.patch FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886060#comment-13886060 ] Andrew Wang commented on HDFS-5841: --- No tests as this is a doc change. I'm going to commit this shortly based on Colin's +1, thanks Colin! Update HDFS caching documentation with new changes -- Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Labels: caching Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI
stack created HDFS-5852: --- Summary: Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 2.3.0 The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5841: -- Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.3. Update HDFS caching documentation with new changes -- Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Labels: caching Fix For: 2.3.0 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5852: Attachment: hdfs-5852.txt Patch that changes our basis from 'green' to 'orange'. Screen shot coming... Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 2.3.0 Attachments: hdfs-5852.txt The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5852: Attachment: new_hdfsui_colors.png Here is what the patch looks like. The colors used ... are 'International Orange (Aerospace) #FF4F00' for banner background and 'International Orange (Golden Gate Bridge) #C0362C' for highlighting when an item is selected in the banner. A lighter hue of 'International Orange (Aerospace)' courtesy of http://www.colorhexa.com/ff4f00 is also used for ui-tabs div. See http://en.wikipedia.org/wiki/International_orange for more on IO. Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 2.3.0 Attachments: hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886072#comment-13886072 ] Jing Zhao commented on HDFS-5776: - Thanks for updating the patch, [~xieliang007] and [~stack]. [~stack], so the latest patch changes setThreadsNumForHedgedReads to private and aims to make users unable to change the thread number from client side dynamically. However, users can still create their own configuration object, change the configuration for thread pool size, create an DFSClient instance, and change the thread number? So I think we may want to make it more clean here. Specifically, # the first DFSClient who tries to enable the hedged read will initialize the thread pool (in the DFSClient constructor or in the enable method), so that the enable can be a real enable # changing of the thread pool size (if it is necessary) should still go through a setThreadsNumForHedgedReads method (instead of the constructor of DFSClient), so that a client cannot silently change the size of the thread pool Besides, the current patch has not addressed the comment for enoughNodesForHedgedRead/chooseDataNode. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886077#comment-13886077 ] Andrew Wang commented on HDFS-5852: --- I'm +1, but let's give others some time to comment before committing. Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Priority: Blocker Labels: webui Fix For: 2.3.0 Attachments: hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5852: -- Labels: webui (was: ) Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Labels: webui Fix For: 2.3.0 Attachments: hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5852: -- Assignee: stack Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Labels: webui Fix For: 2.3.0 Attachments: hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886081#comment-13886081 ] Hudson commented on HDFS-5841: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5064 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5064/]) HDFS-5841. Update HDFS caching documentation with new changes. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562649) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm Update HDFS caching documentation with new changes -- Key: HDFS-5841 URL: https://issues.apache.org/jira/browse/HDFS-5841 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Labels: caching Fix For: 2.3.0 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch The caching documentation is a little out of date, since it's missing description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886083#comment-13886083 ] Jing Zhao commented on HDFS-5399: - bq. even though it's received sufficient block reports to cause it to leave safemode I'm not sure about this even though part, because we did not see corresponding log in our test. bq. I don't think this is possible. The EditLogTailer only takes the FSN lock when it wakes up periodically to tail edits. What if a lot of file creation/deletion requests keep coming? If the editlog keeps growing, is it possible that the SBN keeps tailing the editlog in a single session and cannot get a change to go back to sleep? Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886085#comment-13886085 ] Jing Zhao commented on HDFS-5399: - I will try to setup the test again to see if I can regenerate the issue and find out the cause of the problem. Revisit SafeModeException and corresponding retry policies -- Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, for certain API call (create), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several possible issues in the current implementation: # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # Client may want to retry on other API calls in non-HA setup. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5842: -- Fix Version/s: (was: 2.4.0) 2.3.0 No prob, merged to branch-2.3. Thanks Jing! Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Status: Open (was: Patch Available) Means of telling the datanode to stop using a sick disk --- Key: HDFS-4239 URL: https://issues.apache.org/jira/browse/HDFS-4239 Project: Hadoop HDFS Issue Type: Improvement Reporter: stack Assignee: Jimmy Xiang Attachments: hdfs-4239.patch, hdfs-4239_v2.patch If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally, or just exhibiting high latency -- your choices are: 1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster. 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data. Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while. Nice to have would be being able to tell the datanode to restart using a disk after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5492) Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk
[ https://issues.apache.org/jira/browse/HDFS-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5492: Attachment: HDFS-5492.2.patch Thank you for your comment. Removed mentioning the exact packet size. Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk -- Key: HDFS-5492 URL: https://issues.apache.org/jira/browse/HDFS-5492 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: documentation, newbie Attachments: HDFS-5492.2.patch, HDFS-5492.patch, HDFS-5492.patch HDFS-2069 is not ported to current document. The description of HDFS-2069 is as follows: {quote} Current HDFS architecture information about Trash is incorrectly documented as - The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface. It should be something like - Current default trash interval is set to 0 (Deletes file without storing in trash ) . This value is configurable parameter stored as fs.trash.interval stored in core-site.xml . {quote} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886100#comment-13886100 ] Hudson commented on HDFS-5842: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5065 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5065/]) Update CHANGES.txt to move HDFS-5842 to 2.3.0 (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562656) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster Key: HDFS-5842 URL: https://issues.apache.org/jira/browse/HDFS-5842 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, HADOOP-10215.002.patch, HADOOP-10215.002.patch Noticed this while debugging issues in another application. We saw an error when trying to do a FileSystem.get using an hftp file system on a secure cluster using a proxy user ugi. This is a small snippet used {code} FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() { @Override public FileSystem run() throws IOException { return FileSystem.get(hadoopConf); } }); {code} The same code worked for hdfs and webhdfs but not for hftp when the ugi used was UserGroupInformation.createProxyUser -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5848: - Attachment: h5848_20130130b.patch h5848_20130130b.patch: add a rollingUpgradeInfo field to HeartbeatResponse instead of a new DatanodeCommand. Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress - Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch, h5848_20130130b.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5754: - Attachment: HDFS-5754.012.patch Uploaded a new patch to address the comments. Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion Key: HDFS-5754 URL: https://issues.apache.org/jira/browse/HDFS-5754 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Brandon Li Attachments: FeatureInfo.patch, HDFS-5754.001.patch, HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, HDFS-5754.009.patch, HDFS-5754.010.patch, HDFS-5754.012.patch Currently, LayoutVersion defines the on-disk data format and supported features of the entire cluster including NN and DNs. LayoutVersion is persisted in both NN and DNs. When a NN/DN starts up, it checks its supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a different LayoutVersion than NN cannot register with the NN. We propose to split LayoutVersion into two independent values that are local to the nodes: - NamenodeLayoutVersion - defines the on-disk data format in NN, including the format of FSImage, editlog and the directory structure. - DatanodeLayoutVersion - defines the on-disk data format in DN, including the format of block data file, metadata file, block pool layout, and the directory structure. The LayoutVersion check will be removed in DN registration. If NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5848) Add rolling upgrade infomation to heartbeat response
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5848: - Description: When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We only change heartbeat response here. The datanode change will be done in a separated JIRA. (was: When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We need to add a new DatanodeCommand here. The datanode change will be done in a separated JIRA.) Summary: Add rolling upgrade infomation to heartbeat response (was: Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress) Add rolling upgrade infomation to heartbeat response Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch, h5848_20130130b.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We only change heartbeat response here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5848) Add rolling upgrade infomation to heartbeat response
[ https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886108#comment-13886108 ] Suresh Srinivas commented on HDFS-5848: --- I actually think this just be an upgrade state that is part of HeartbeatResponse instead of a separate command, much like the {{haStatus}} member it currently has. Add rolling upgrade infomation to heartbeat response Key: HDFS-5848 URL: https://issues.apache.org/jira/browse/HDFS-5848 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5848_20130130.patch, h5848_20130130b.patch When rolling upgrade is in progress, NN should inform datanodes via heartbeat responses so that datanode should create hardlinks when deleting blocks. We only change heartbeat response here. The datanode change will be done in a separated JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886107#comment-13886107 ] stack commented on HDFS-5776: - [~jingzhao] Thanks for the new input. Please help me better understand what you mean by making more clean so we can adjust the patch accordingly. Hedged reads are set on or off in the client configuration xml and per DFSClient instance can be enabled/disabled as you go. Yes, you could read code and figure that it is possible to do some heavyweight gymnastics creating your own Configuration -- expensive -- and a new DFSClient -- ditto -- if you wanted to work around whatever is out in the configuration xml. That seems fine by me especially as there is no real means of shutting down this access route. Pardon me but I do not follow what you are asking for in 1. Maybe you are referring to a 'hole' where if the thread count is = 0 on construction, the enable will have no effect -- and you want it to have an 'effect' post construction? For 2., you are suggesting that setThreadsNumForHedgedReads not be private but be available API for the DFSClient to toggle as it sees fit? I'll let @liang xie address your enoughNodesForHedgedRead comment. Thanks for checking back. Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886109#comment-13886109 ] Tsz Wo (Nicholas), SZE commented on HDFS-5852: -- What if there is a vendor using orange (or any color you chosen)? Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Labels: webui Fix For: 2.3.0 Attachments: hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5767: - Priority: Blocker (was: Major) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes Key: HDFS-5767 URL: https://issues.apache.org/jira/browse/HDFS-5767 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.3.0 Environment: With LDAP enabled Reporter: Yongjun Zhang Assignee: Brandon Li Priority: Blocker I'm seeing that the nfs implementation assumes unique userName, userId pair to be returned by command getent paswd. That is, for a given userName, there should be a single userId, and for a given userId, there should be a single userName. The reason is explained in the following message: private static final String DUPLICATE_NAME_ID_DEBUG_INFO = NFS gateway can't start with duplicate name or id on the host system.\n + This is because HDFS (non-kerberos cluster) uses name as the only way to identify a user or group.\n + The host system with duplicated user/group name or id might work fine most of the time by itself.\n + However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n + Therefore, same name means the same user or same group. To find the duplicated names/ids, one can do:\n + getent passwd | cut -d: -f1,3 and getent group | cut -d: -f1,3 on Linux systms,\n + dscl . -list /Users UniqueID and dscl . -list /Groups PrimaryGroupID on MacOS.; This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do some examination: What exist in /etc/passwd: $ more /etc/passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh $ more /etc/passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh The above result says userName bin has userId 2, and daemon has userId 1. What we can see with getent passwd command due to LDAP: $ getent passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh bin:x:1:1:bin:/bin:/sbin/nologin $ getent passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh daemon:x:2:2:daemon:/sbin:/sbin/nologin We can see that there are multiple entries for the same userName with different userIds, and the same userId could be associated with different userNames. So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA for a solution. Hi [~brandonli], since you implemented most of the nfs feature, would you please comment? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
[ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886116#comment-13886116 ] Brandon Li commented on HDFS-5767: -- Make it a blocker for 2.3 release. I just experienced a couple real life examples that complete duplicated accounts exist in local database and LDAP server, and administrators don't want to clean up the dups. Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes Key: HDFS-5767 URL: https://issues.apache.org/jira/browse/HDFS-5767 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.3.0 Environment: With LDAP enabled Reporter: Yongjun Zhang Assignee: Brandon Li Priority: Blocker I'm seeing that the nfs implementation assumes unique userName, userId pair to be returned by command getent paswd. That is, for a given userName, there should be a single userId, and for a given userId, there should be a single userName. The reason is explained in the following message: private static final String DUPLICATE_NAME_ID_DEBUG_INFO = NFS gateway can't start with duplicate name or id on the host system.\n + This is because HDFS (non-kerberos cluster) uses name as the only way to identify a user or group.\n + The host system with duplicated user/group name or id might work fine most of the time by itself.\n + However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n + Therefore, same name means the same user or same group. To find the duplicated names/ids, one can do:\n + getent passwd | cut -d: -f1,3 and getent group | cut -d: -f1,3 on Linux systms,\n + dscl . -list /Users UniqueID and dscl . -list /Groups PrimaryGroupID on MacOS.; This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do some examination: What exist in /etc/passwd: $ more /etc/passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh $ more /etc/passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh The above result says userName bin has userId 2, and daemon has userId 1. What we can see with getent passwd command due to LDAP: $ getent passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh bin:x:1:1:bin:/bin:/sbin/nologin $ getent passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh daemon:x:2:2:daemon:/sbin:/sbin/nologin We can see that there are multiple entries for the same userName with different userIds, and the same userId could be associated with different userNames. So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA for a solution. Hi [~brandonli], since you implemented most of the nfs feature, would you please comment? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)