[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095204#comment-14095204 ] Sanjay Radia commented on HDFS-6134: Larry I don't completely get the difference between webhdfs and httpfs but I think the cause of the difference is that user hdfs is superuser (note DN runs as hdfs and webhdfs code is executed on behalf of the end-user inside the DN after checking the permissions), Hence I think this would potentially open up access to all encrypted files that are readable. However that should NOT happen if doAs is used (correct?). I agree it would be unacceptable to say that if one enables transparent encryption then one should disable webhdfs because it would become insecure, Andrew say that Regarding webhdfs, it's not a recommended deployment but Aljeandro say Both httpfs and webhdfs will work just fine but then in the same paragraph says this could fail some security audits. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.
Yi Liu created HDFS-6846: Summary: NetworkTopology#sortByDistance should give nodes higher priority, which cache the block. Key: HDFS-6846 URL: https://issues.apache.org/jira/browse/HDFS-6846 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Currently there are 3 weights: * local * same rack * off rack But if some nodes cache the block, then it's faster if client read block from these nodes. So we should have some more weights as following: * local * cached same rack * same rack * cached off rack * off rack -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6567) Clean up HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095220#comment-14095220 ] Hadoop QA commented on HDFS-6567: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661387/HDFS-6567.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7622//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7622//console This message is automatically generated. Clean up HdfsFileStatus --- Key: HDFS-6567 URL: https://issues.apache.org/jira/browse/HDFS-6567 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6567.000.patch As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} is reversed. This jira proposes to fix the order and to make the code more consistent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6847) Archival Storage: Support storage policy on a directory
Jing Zhao created HDFS-6847: --- Summary: Archival Storage: Support storage policy on a directory Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Summary: Archival Storage: Support storage policy on directories (was: Archival Storage: Support storage policy on a directory) Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on a directory
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Description: This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. Archival Storage: Support storage policy on a directory --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6844) Archival Storage: Extend HdfsFileStatus to get storage policy
[ https://issues.apache.org/jira/browse/HDFS-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-6844. - Resolution: Duplicate HDFS-6847 will cover the same functionality. Close this as duplicate. Archival Storage: Extend HdfsFileStatus to get storage policy - Key: HDFS-6844 URL: https://issues.apache.org/jira/browse/HDFS-6844 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6844.000.patch We need a way to get the current storage policy id of existing files. This can be achieved by extending HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated
[ https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095245#comment-14095245 ] Hadoop QA commented on HDFS-6321: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661392/HDFS-6321.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7623//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7623//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7623//console This message is automatically generated. Add a message on the old web UI that indicates the old UI is deprecated --- Key: HDFS-6321 URL: https://issues.apache.org/jira/browse/HDFS-6321 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6321.000.patch HDFS-6252 has removed the jsp ui from trunk. We should add a message in the old web ui to indicate that the ui has been deprecated and ask the user to move towards the new web ui. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Attachment: HDFS-6847.000.patch Initial patch. Use XAttr to set storage policy id for directories. Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6843) Create FileStatus.isEncrypted() method
[ https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095299#comment-14095299 ] Steve Loughran commented on HDFS-6843: -- what about making it an enum in case encryption policies change in future? Create FileStatus.isEncrypted() method -- Key: HDFS-6843 URL: https://issues.apache.org/jira/browse/HDFS-6843 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb FileStatus should have a 'boolean isEncrypted()' method. (it was in the context of discussing with AndreW about FileStatus being a Writable). Having this method would allow MR JobSubmitter do the following: - BOOLEAN intermediateEncryption = false IF jobconf.contains(mr.intermidate.encryption) THEN intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption) ELSE IF (I/O)Format INSTANCEOF File(I/O)Format THEN intermediateEncryption = ANY File(I/O)Format HAS a Path with status isEncrypted()==TRUE FI jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption) FI -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-6833: - Attachment: HDFS-6833.patch I attach a patch file which I added a test case. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6841: Attachment: HDFS-6841-001.patch Changed in HDFS project. Didn't change in all Tests. Changed only for required tests. Use Time.monotonicNow() wherever applicable instead of Time.now() - Key: HDFS-6841 URL: https://issues.apache.org/jira/browse/HDFS-6841 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6841-001.patch {{Time.now()}} used in many places to calculate elapsed time. This should be replaced with {{Time.monotonicNow()}} to avoid effect of System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6841: Status: Patch Available (was: Open) Use Time.monotonicNow() wherever applicable instead of Time.now() - Key: HDFS-6841 URL: https://issues.apache.org/jira/browse/HDFS-6841 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6841-001.patch {{Time.now()}} used in many places to calculate elapsed time. This should be replaced with {{Time.monotonicNow()}} to avoid effect of System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095363#comment-14095363 ] Uma Maheswara Rao G commented on HDFS-6247: --- +1 Patch looks good to me. Thanks Vinay! Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095380#comment-14095380 ] Hudson commented on HDFS-6830: -- FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/644/]) HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block replica. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java BlockInfo.addStorage fails when DN changes the storage for a block replica -- Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, HDFS-6830.03.patch, HDFS-6830.04.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders
[ https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095385#comment-14095385 ] Hudson commented on HDFS-6836: -- FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/644/]) HDFS-6836. HDFS INFO logging is verbose uses file appenders. (Contributed by Xiaoyu Yao) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java HDFS INFO logging is verbose uses file appenders -- Key: HDFS-6836 URL: https://issues.apache.org/jira/browse/HDFS-6836 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.4.1 Reporter: Gopal V Assignee: Xiaoyu Yao Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch Reported by: [~gopalv]. The HDFS INFO logs is present within the inner loops of HDFS logging information like {code} 2014-07-24 19:43:34,459 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 6827335 2014-07-24 19:43:34,465 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 7178626 2014-07-24 19:43:34,467 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8540703 2014-07-24 19:43:34,474 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 8220422 2014-07-24 19:43:34,477 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8327499 {code} Looks like future releases of log4j will fix this to be faster - https://issues.apache.org/jira/browse/LOG4J2-163 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095433#comment-14095433 ] Hadoop QA commented on HDFS-6833: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661423/HDFS-6833.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7624//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7624//console This message is automatically generated. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095470#comment-14095470 ] Larry McCay commented on HDFS-6134: --- I guess if webhdfs is allowed to doAs the end user 'hdfs' then that can be a problem. But again, I don't see what keeps an admin from doing that with httpfs as well. It seems as though KMS needs to have the ability to not allow 'hdfs' user gain keys through any trusted proxy but still allow a trusted proxy that is running as a superuser to doAs other users. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack
[ https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095499#comment-14095499 ] Daryn Sharp commented on HDFS-6840: --- I'd like complete removal of the random seed. Why allow users the option of shooting themselves in the foot? As far as I can tell, the seed was added due a misunderstanding that former behavior was deterministic? I cannot envision a use case where all off-rack clients bombarding a single node is a good idea. Clients are always sent to the same datanode when read is off rack -- Key: HDFS-6840 URL: https://issues.apache.org/jira/browse/HDFS-6840 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Jason Lowe Priority: Critical After HDFS-6268 the sorting order of block locations is deterministic for a given block and locality level (e.g.: local, rack. off-rack), so off-rack clients all see the same datanode for the same block. This leads to very poor behavior in distributed cache localization and other scenarios where many clients all want the same block data at approximately the same time. The one datanode is crushed by the load while the other replicas only handle local and rack-local requests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders
[ https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095513#comment-14095513 ] Hudson commented on HDFS-6836: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) HDFS-6836. HDFS INFO logging is verbose uses file appenders. (Contributed by Xiaoyu Yao) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java HDFS INFO logging is verbose uses file appenders -- Key: HDFS-6836 URL: https://issues.apache.org/jira/browse/HDFS-6836 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.4.1 Reporter: Gopal V Assignee: Xiaoyu Yao Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch Reported by: [~gopalv]. The HDFS INFO logs is present within the inner loops of HDFS logging information like {code} 2014-07-24 19:43:34,459 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 6827335 2014-07-24 19:43:34,465 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 7178626 2014-07-24 19:43:34,467 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8540703 2014-07-24 19:43:34,474 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 8220422 2014-07-24 19:43:34,477 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8327499 {code} Looks like future releases of log4j will fix this to be faster - https://issues.apache.org/jira/browse/LOG4J2-163 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095508#comment-14095508 ] Hudson commented on HDFS-6830: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block replica. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java BlockInfo.addStorage fails when DN changes the storage for a block replica -- Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, HDFS-6830.03.patch, HDFS-6830.04.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-6833: - Attachment: HDFS-6833.patch DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095576#comment-14095576 ] Hadoop QA commented on HDFS-6841: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661432/HDFS-6841-001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestGetBlocks org.apache.hadoop.hdfs.TestLease {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7625//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7625//console This message is automatically generated. Use Time.monotonicNow() wherever applicable instead of Time.now() - Key: HDFS-6841 URL: https://issues.apache.org/jira/browse/HDFS-6841 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6841-001.patch {{Time.now()}} used in many places to calculate elapsed time. This should be replaced with {{Time.monotonicNow()}} to avoid effect of System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders
[ https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095603#comment-14095603 ] Hudson commented on HDFS-6836: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) HDFS-6836. HDFS INFO logging is verbose uses file appenders. (Contributed by Xiaoyu Yao) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java HDFS INFO logging is verbose uses file appenders -- Key: HDFS-6836 URL: https://issues.apache.org/jira/browse/HDFS-6836 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.4.1 Reporter: Gopal V Assignee: Xiaoyu Yao Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch Reported by: [~gopalv]. The HDFS INFO logs is present within the inner loops of HDFS logging information like {code} 2014-07-24 19:43:34,459 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 6827335 2014-07-24 19:43:34,465 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 7178626 2014-07-24 19:43:34,467 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8540703 2014-07-24 19:43:34,474 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33, offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 8220422 2014-07-24 19:43:34,477 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33, offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 8327499 {code} Looks like future releases of log4j will fix this to be faster - https://issues.apache.org/jira/browse/LOG4J2-163 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095598#comment-14095598 ] Hudson commented on HDFS-6830: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block replica. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java BlockInfo.addStorage fails when DN changes the storage for a block replica -- Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, HDFS-6830.03.patch, HDFS-6830.04.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6016) Update datanode replacement policy to make writes more robust
[ https://issues.apache.org/jira/browse/HDFS-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6016: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Per Nicholas' comment, I won't fix it. Update datanode replacement policy to make writes more robust - Key: HDFS-6016 URL: https://issues.apache.org/jira/browse/HDFS-6016 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-6016.patch, HDFS-6016.patch As discussed in HDFS-5924, writers that are down to only one node due to node failures can suffer if a DN does not restart in time. We do not worry about writes that began with single replica. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6801) Archival Storage: Add a new data migration tool
[ https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6801: -- Attachment: h6801_20140813.patch h6801_20140813.patch: 1st patch. Archival Storage: Add a new data migration tool Key: HDFS-6801 URL: https://issues.apache.org/jira/browse/HDFS-6801 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6801_20140813.patch The tool is similar to Balancer. It periodic scans the blocks in HDFS and uses path and/or other meta data (e.g. mtime) to determine if a block should be cooled down (i.e. hot = warm, or warm = cold) or warmed up (i.e. cold = warm, or warm = hot). In contrast to Balancer, the migration tool always move replicas to a different storage type. Similar to Balancer, the replicas are moved in a way that the number of racks the block does not decrease. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Attachment: HDFS-6663-WIP.patch In progress patch, will add more test cases that include decommission and block corruption in the next version. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-WIP.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095668#comment-14095668 ] Alejandro Abdelnur commented on HDFS-6134: -- Let me try to explain things a different way. When setting up filesystem encryption in HDFS (forget about webhdfs and httpfs for now), things will be configured so the HDFS superuser cannot retrieve decrypted 'file encryption keys'. Because the HDFS superuser has access to the encrypted versions of the files, having access to the decrypted 'file encryption keys' would allow the HDFS superuser to get access to the decrypted file. One of the goals of HDFS encryption is to prevent that. This is achieved by blacklisting the HDFS superuser from retrieving decrypted 'file encryption keys' from the KMS. This blacklist is must be enforced on the real UGI hitting the KMS (regardless if it is doing a doAs or not). If you set up httpfs, it runs using the 'httpfs' user, a HDFS regular user configured as proxyuser to interact with HDFS and KMS doing doAs calls. If you set up webhdfs, it runs using the 'hdfs' user, the HDFS superuser, and this user will have to be configured as proxyuser in the KMS to work with doAs calls. Also the 'hdfs' user will have to be removed from the KMS decrypt-keys blacklist (*and this is the problem*). Even if you audit the webhdfs code running in the DNs to ensure things are always done using doAs and that there is no foul play in the DN code there is an issue. The issue is: * An HDFS admin logins to a DN in the cluster as 'hdfs' * Then he kinits as 'hdsf/HOST' * Then he curls the KMS asking to decrypted keys as user X doing a doAs * Because he has access to the encrypted file, and now has the decrypted key, gets access to the file in clear hope this clarifies. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095672#comment-14095672 ] Alejandro Abdelnur commented on HDFS-6826: -- [~clamb], that seems correct. We make sure this is not an issue under normal circumstances by implementing caching. The same would hold for any plugin implementation meant for production usage. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095719#comment-14095719 ] Larry McCay commented on HDFS-6134: --- Thanks [~tucu00] that is pretty clear. The question that remains for me is why this same scenario isn't achievable by the admin kinit'ing as httpfs/HOST or Oozie or some other trusted proxy and then issuing a request with a doAs user X. We have to somehow fix this for webhdfs - it is an expected and valuable API and should remain so with encrypted files without introducing a vulnerability. Even if we have to do something like use another proxy (like Knox) and a shared secret to ensure that there is additional verification of the origin of a KMS request from webhdfs. This would enable proxies to access webhdfs resources with a signed/encrypted token - if KMS gets a signed request from webhdfs that it can verify then it can proceed. The shared secret can be made available through the credential provider API and webhdfs itself would just see it as an opaque token that needs to be passed in the KMS request. Requiring an extra hop for this access would be unfortunate too but if it is for additional security of the data it may be acceptable. Anyway, that's just a thought for keeping webhdfs as a first class citizen. We have to do something. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095730#comment-14095730 ] Alejandro Abdelnur commented on HDFS-6134: -- Larry, if the httpfs admin is a different person than the hdfs admin you don't have the problem. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6845) XSS and or content injection in hdfs
[ https://issues.apache.org/jira/browse/HDFS-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095764#comment-14095764 ] Haohui Mai commented on HDFS-6845: -- The XSS is only for the only web UI which has been deprecated and removed in trunk. The new web UI is based on dust.js, which defends XSS attack much more systematically. XSS and or content injection in hdfs Key: HDFS-6845 URL: https://issues.apache.org/jira/browse/HDFS-6845 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Reporter: clouds Labels: security Following up from email ... I was auditing the latest stable version of hdfs - 2.4.1 (as made available from http://mirror.nexcess.net/apache/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz ), I noticed an interesting XSS filter. Ok, sure. But what intrigued me was where I didn't find any attempt to validate or sanitize. Within DatanodeJSPHelper.java - line 108, nnAddr is assigned the value from the raw parameter NAMENODE_ADDRESS. On line 120, printgotoform is called with the raw value. Then then called JspHelper.java's printGotoForm method - Line 452. Then on line 468, the unvalidated or sanitized value is printed to the html page. Worst case, reflected XSS. Better case, content injection. Similarily, DatanodeJSPHelper.java's line 102 tokenString variable looks plausible but I am not certain if an incorrect token will cause the business logic to fail before the malicious input it displayed (JspHelper.java - line 465.) ... These are not the only XSS / Content injection points but should give an easy idea to find the others. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack
[ https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095763#comment-14095763 ] Daryn Sharp commented on HDFS-6840: --- We believe but haven't proven that this deterministic behavior is causing even more problems. Block replication and invalidation appear to be impacted. As in changing the replication factor sometimes takes up to an hour to start, and there's a slow but steady increase in blocks pending deletion on clusters running 2.5. We believe the NN is repeatedly picking the same faulty DN to issue the copy block and invalidate block. Clients are always sent to the same datanode when read is off rack -- Key: HDFS-6840 URL: https://issues.apache.org/jira/browse/HDFS-6840 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Jason Lowe Priority: Critical After HDFS-6268 the sorting order of block locations is deterministic for a given block and locality level (e.g.: local, rack. off-rack), so off-rack clients all see the same datanode for the same block. This leads to very poor behavior in distributed cache localization and other scenarios where many clients all want the same block data at approximately the same time. The one datanode is crushed by the load while the other replicas only handle local and rack-local requests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated
[ https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095767#comment-14095767 ] Haohui Mai commented on HDFS-6321: -- Can you please minimize your patch? And it is sufficient to only display the message on index.jsp on NN. Add a message on the old web UI that indicates the old UI is deprecated --- Key: HDFS-6321 URL: https://issues.apache.org/jira/browse/HDFS-6321 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6321.000.patch HDFS-6252 has removed the jsp ui from trunk. We should add a message in the old web ui to indicate that the ui has been deprecated and ask the user to move towards the new web ui. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5887) Add suffix to generated protobuf class
[ https://issues.apache.org/jira/browse/HDFS-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095769#comment-14095769 ] Haohui Mai commented on HDFS-5887: -- The jira is still valid. Please check fsimage.proto and the corresponding comments in HDFS-5698. Add suffix to generated protobuf class -- Key: HDFS-5887 URL: https://issues.apache.org/jira/browse/HDFS-5887 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Priority: Minor As suggested by [~tlipcon], the code is more readable if we give each class generated by the protobuf the suffix Proto. This jira proposes to rename the classes and to introduce no functionality changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5887) Add suffix to generated protobuf class
[ https://issues.apache.org/jira/browse/HDFS-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5887: - Assignee: (was: Haohui Mai) Add suffix to generated protobuf class -- Key: HDFS-5887 URL: https://issues.apache.org/jira/browse/HDFS-5887 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Priority: Minor As suggested by [~tlipcon], the code is more readable if we give each class generated by the protobuf the suffix Proto. This jira proposes to rename the classes and to introduce no functionality changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6567) Clean up HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095770#comment-14095770 ] Haohui Mai commented on HDFS-6567: -- Looks good to me. +1. I'll commit it shortly. Clean up HdfsFileStatus --- Key: HDFS-6567 URL: https://issues.apache.org/jira/browse/HDFS-6567 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6567.000.patch As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} is reversed. This jira proposes to fix the order and to make the code more consistent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated
[ https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tassapol Athiapinya updated HDFS-6321: -- Attachment: HDFS-6321.001.patch [~wheat9] Thanks for review. Can you please look at new patch (HDFS-6321.001.patch)? I only put a message for deprecated page at index.jsp in NN now. Add a message on the old web UI that indicates the old UI is deprecated --- Key: HDFS-6321 URL: https://issues.apache.org/jira/browse/HDFS-6321 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6321.000.patch, HDFS-6321.001.patch HDFS-6252 has removed the jsp ui from trunk. We should add a message in the old web ui to indicate that the ui has been deprecated and ask the user to move towards the new web ui. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6567) Normalize the order of public final in HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6567: - Summary: Normalize the order of public final in HdfsFileStatus (was: Clean up HdfsFileStatus) Normalize the order of public final in HdfsFileStatus - Key: HDFS-6567 URL: https://issues.apache.org/jira/browse/HDFS-6567 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6567.000.patch As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} is reversed. This jira proposes to fix the order and to make the code more consistent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6843) Create FileStatus.isEncrypted() method
[ https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095797#comment-14095797 ] Andrew Wang commented on HDFS-6843: --- I looked into how we'd do this, and one major issue is Writable compatibility. We can't add a new field to FileStatus without breaking compat. ACLs took the approach of re-using an unused bit in the permissions short, and we'd have to do something similar. An enum would involve reserving more of our precious unused bits for this purpose. Steve, do you mind laying out your usecase in a little more detail? An enum by itself isn't very expressive. I figured if users want more information, we could add a new API that returns an EncryptedFileStatus with all the gory details. Create FileStatus.isEncrypted() method -- Key: HDFS-6843 URL: https://issues.apache.org/jira/browse/HDFS-6843 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb FileStatus should have a 'boolean isEncrypted()' method. (it was in the context of discussing with AndreW about FileStatus being a Writable). Having this method would allow MR JobSubmitter do the following: - BOOLEAN intermediateEncryption = false IF jobconf.contains(mr.intermidate.encryption) THEN intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption) ELSE IF (I/O)Format INSTANCEOF File(I/O)Format THEN intermediateEncryption = ANY File(I/O)Format HAS a Path with status isEncrypted()==TRUE FI jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption) FI -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.
[ https://issues.apache.org/jira/browse/HDFS-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095807#comment-14095807 ] Jason Lowe commented on HDFS-6846: -- This could be very undesirable if a single node is the only one that has a cached block and suddenly the block becomes very popular (e.g.: during localization across many nodes in a large cluster). Unless the block is highly replicated, most requests will be off-rack and the one node that has it cached will be hammered. Having the block in memory doesn't help if the NIC saturates from the traffic. I just want to make sure we don't end up with another form of HDFS-6840. NetworkTopology#sortByDistance should give nodes higher priority, which cache the block. Key: HDFS-6846 URL: https://issues.apache.org/jira/browse/HDFS-6846 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Currently there are 3 weights: * local * same rack * off rack But if some nodes cache the block, then it's faster if client read block from these nodes. So we should have some more weights as following: * local * cached same rack * same rack * cached off rack * off rack -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6567) Normalize the order of public final in HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6567: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~tassapola] for the contribution. Normalize the order of public final in HdfsFileStatus - Key: HDFS-6567 URL: https://issues.apache.org/jira/browse/HDFS-6567 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Tassapol Athiapinya Fix For: 2.6.0 Attachments: HDFS-6567.000.patch As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} is reversed. This jira proposes to fix the order and to make the code more consistent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095830#comment-14095830 ] Vinayakumar B commented on HDFS-6247: - Thanks [~umamaheswararao] and [~clamb] for the reviews. Committed to trunk and branch-2 Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6247: Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095840#comment-14095840 ] Hudson commented on HDFS-6847: -- FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6056/]) HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617784) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6567) Normalize the order of public final in HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095839#comment-14095839 ] Hudson commented on HDFS-6567: -- FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6056/]) HDFS-6567. Normalize the order of public final in HdfsFileStatus. Contributed by Tassapol Athiapinya. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617779) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java Normalize the order of public final in HdfsFileStatus - Key: HDFS-6567 URL: https://issues.apache.org/jira/browse/HDFS-6567 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Tassapol Athiapinya Fix For: 2.6.0 Attachments: HDFS-6567.000.patch As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} is reversed. This jira proposes to fix the order and to make the code more consistent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095844#comment-14095844 ] Hadoop QA commented on HDFS-6833: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661465/HDFS-6833.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7626//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7626//console This message is automatically generated. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095877#comment-14095877 ] Jing Zhao commented on HDFS-6247: - Hi Vinay, looks like when you commit the patch you write the jira number as HDFS-6847 :) We may need to update the CHANGES.txt. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095874#comment-14095874 ] Colin Patrick McCabe commented on HDFS-6803: bq. Should we require the implementations of FSInputStream should be thread-safe (or at least for PositionedReadable)? And modify some implementations such as WebHDFS to make pread concurrently? I think stack's point 2.1 and 2.2 imply that pread can safely be called from multiple threads concurrently. I guess we should document this too so that there's no confusion. Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: DocumentingDFSClientDFSInputStream (1).pdf Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095880#comment-14095880 ] Daryn Sharp commented on HDFS-6826: --- Arg, yesterday's jira issues apparently caused my comment to be lost. The group mapping authz is a bit different. It's not context sensitive, as in a user uniformly belongs to groups across the whole namesystem. Path-based context sensitivity is adding hidden magic to a filesystem. How will the special magic be represented to the user confused by why the perms/ACLs aren't being honored? How will permission apis and FsShell interact with the magic? Instead of trying to hack special behavior for a specific use case into the NN, how about leveraging what's there. A cleaner way may be for a custom group mapping to fabricate groups something like hive:table or hive:table:column. No code changes in the NN. Everything is contained in the custom groups mapping. I still think leveraging ACLs is the best way to go... Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095884#comment-14095884 ] Vinayakumar B commented on HDFS-6247: - Oops. My Bad. Thanks Jing for pointing me out. I will correct it right away. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095903#comment-14095903 ] Zhe Zhang commented on HDFS-5135: - [~brandonli] I wonder why TestOutOfOrderWrite directly calls Nfs3Utils.writeChannel() instead of going through OpenFileCtx. Is it *Not* supposed to test the reordering capability of the NFS gateway? Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095917#comment-14095917 ] Vinayakumar B commented on HDFS-6247: - Jira number updated by reverting and committing with correct Jira number. Thanks again Jing. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095930#comment-14095930 ] Alejandro Abdelnur commented on HDFS-6826: -- [~daryn], bq. The group mapping authz is a bit different. It's not context sensitive, as in a user uniformly belongs to groups across the whole namesystem. Mmmhh, I’d argue that it is context sensitive, 'user' context, just a different context. bq. Path-based context sensitivity is adding hidden magic to a filesystem. How will the special magic be represented to the user confused by why the perms/ACLs aren't being honored? The authorization enforcement semantics does not change at all. The plugin cannot change the permission check logic. The plugin is responsible for providing user/group/permissions/ACLs information to the NN who enforces the permissions consistently regardless of the plugin in use. bq. How will permission apis and FsShell interact with the magic? The work as usual. Check the attached patch, the current HDFS user/group/permission/ACLs handling is done by a plugin implementation. Said that, a plugin implementation may decide to disable changes of user/group/permissions/ACLs. This can be done either silently or failing. bq. Instead of trying to hack special behavior for a specific use case into the NN, how about leveraging what's there. The proposal doc describes in detail 3 different usecases: HiveMetaStore tables, Hbase tables, Solr search collections. bq. A cleaner way may be for a custom group mapping to fabricate groups something like hive:table or hive:table:column. No code changes in the NN. Everything is contained in the custom groups mapping. This does not solve the problem. When adding a directory as a HiveMetaStore table partition, unless you set those special groups explicitly, they would not be in the files being added to the table. It requires client side group manipulation and this is what breaks things. bq. I still think leveraging ACLs is the best way to go... Actually, we are. In the case of HiveMetaStore, the plugin would expose GRANT permissions as ACLs. Daryn, I'm happy to jump on the phone if you want have a synchronous discussion. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6841: Attachment: HDFS-6841-002.patch Fixed Tests Use Time.monotonicNow() wherever applicable instead of Time.now() - Key: HDFS-6841 URL: https://issues.apache.org/jira/browse/HDFS-6841 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch {{Time.now()}} used in many places to calculate elapsed time. This should be replaced with {{Time.monotonicNow()}} to avoid effect of System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()
Ted Yu created HDFS-6848: Summary: Lack of synchronization on access to datanodeUuid in DataStorage#format() Key: HDFS-6848 URL: https://issues.apache.org/jira/browse/HDFS-6848 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} this.datanodeUuid = datanodeUuid; {code} The above assignment should be done holding lock DataStorage.this - as is done in two other places. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6832) Fix the usage of 'hdfs namenode' command
[ https://issues.apache.org/jira/browse/HDFS-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095938#comment-14095938 ] Akira AJISAKA commented on HDFS-6832: - [~Pooja.Gupta], feel free to create a patch and attach it to this jira. Unfortunately, I don't have the permission to assign you. A committer will assign you when the patch is committed. Fix the usage of 'hdfs namenode' command Key: HDFS-6832 URL: https://issues.apache.org/jira/browse/HDFS-6832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Reporter: Akira AJISAKA Priority: Minor Labels: newbie {code} [root@trunk ~]# hdfs namenode -help Usage: java NameNode [-backup] | [-checkpoint] | [-format [-clusterid cid ] [-force] [-nonInteractive] ] | [-upgrade [-clusterid cid] [-renameReservedk-v pairs] ] | [-upgradeOnly [-clusterid cid] [-renameReservedk-v pairs] ] | [-rollback] | [-rollingUpgrade downgrade|rollback ] | [-finalize] | [-importCheckpoint] | [-initializeSharedEdits] | [-bootstrapStandby] | [-recover [ -force] ] | [-metadataVersion ] ] {code} There're some issues in the usage to be fixed. # Usage: java NameNode should be Usage: hdfs namenode # -rollingUpgrade started option should be added # The last ']' should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer
[ https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095949#comment-14095949 ] Hudson commented on HDFS-6247: -- FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6057/]) HDFS-6247. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer (vinayakumarb) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617799) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer Key: HDFS-6247 URL: https://issues.apache.org/jira/browse/HDFS-6247 Project: Hadoop HDFS Issue Type: Bug Components: balancer, datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch Currently there is no response sent from target Datanode to Balancer for the replaceBlock() calls. Since the Block movement for balancing is throttled, complete block movement will take time and this could result in timeout at Balancer, which will be trying to read the status message. To Avoid this during replaceBlock() call in in progress Datanode can send IN_PROGRESS status messages to Balancer to avoid timeouts and treat BlockMovement as failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095948#comment-14095948 ] Hudson commented on HDFS-6847: -- FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6057/]) Reverted Merged revision(s) 1617784 from hadoop/common/trunk: HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617794) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6546) Add non-superuser capability to get the encryption zone for a specific path
[ https://issues.apache.org/jira/browse/HDFS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6546: --- Attachment: HDFS-6546.001.patch The attached patch adds a new method to HdfsAdmin: getEncryptionZoneRootForPath which accepts a path and returns the path of the EZ root (if the arg is in an ez) or null (if it is not in an ez. Add non-superuser capability to get the encryption zone for a specific path --- Key: HDFS-6546 URL: https://issues.apache.org/jira/browse/HDFS-6546 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6546.001.patch Need to add protocol, api, and CLI that allows a non super user to ask whether a path is part of an EZ, and if so, which one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027 ] Sanjay Radia commented on HDFS-6134: Alejandro, if we treat user hdfs as a special user such that the HDFS system will not accept any client connections from hdfs then does this solve this problem?. An Admin will not be able to connect as user hdfs but can connect as user ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027 ] Sanjay Radia edited comment on HDFS-6134 at 8/13/14 8:19 PM: - Alejandro, a potential solution: treat user hdfs as a special user such that the HDFS system will NOT accept any client connections from hdfs. An Admin will not be able to connect as user hdfs but can connect as user, say, ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) was (Author: sanjay.radia): Alejandro, if we treat user hdfs as a special user such that the HDFS system will not accept any client connections from hdfs then does this solve this problem?. An Admin will not be able to connect as user hdfs but can connect as user ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation
[ https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur moved HADOOP-10836 to HDFS-6849: --- Component/s: (was: security) security Target Version/s: (was: 2.6.0) Affects Version/s: (was: 2.4.1) 2.4.1 Key: HDFS-6849 (was: HADOOP-10836) Project: Hadoop HDFS (was: Hadoop Common) Replace HttpFS custom proxyuser handling with common implementation --- Key: HDFS-6849 URL: https://issues.apache.org/jira/browse/HDFS-6849 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch Use HADOOP-10835 to implement proxyuser logic in HttpFS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation
[ https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-6849: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to trunk and branch-2. Replace HttpFS custom proxyuser handling with common implementation --- Key: HDFS-6849 URL: https://issues.apache.org/jira/browse/HDFS-6849 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.6.0 Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch Use HADOOP-10835 to implement proxyuser logic in HttpFS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation
[ https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096116#comment-14096116 ] Hudson commented on HDFS-6849: -- FAILURE: Integrated in Hadoop-trunk-Commit #6060 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6060/]) HDFS-6849. Replace HttpFS custom proxyuser handling with common implementation. (tucu) (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617831) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSAuthenticationFilter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServerWebApp.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/ProxyUser.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/security/ProxyUserService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs/UserProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/security/TestProxyUserService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/wsrs/TestUserProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Replace HttpFS custom proxyuser handling with common implementation --- Key: HDFS-6849 URL: https://issues.apache.org/jira/browse/HDFS-6849 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.6.0 Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch Use HADOOP-10835 to implement proxyuser logic in HttpFS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated
[ https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096124#comment-14096124 ] Hadoop QA commented on HDFS-6321: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661489/HDFS-6321.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7627//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7627//console This message is automatically generated. Add a message on the old web UI that indicates the old UI is deprecated --- Key: HDFS-6321 URL: https://issues.apache.org/jira/browse/HDFS-6321 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Tassapol Athiapinya Attachments: HDFS-6321.000.patch, HDFS-6321.001.patch HDFS-6252 has removed the jsp ui from trunk. We should add a message in the old web ui to indicate that the ui has been deprecated and ask the user to move towards the new web ui. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Comment: was deleted (was: FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6056/]) HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617784) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java ) Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Comment: was deleted (was: FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6057/]) Reverted Merged revision(s) 1617784 from hadoop/common/trunk: HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617794) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java ) Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096170#comment-14096170 ] Jing Zhao commented on HDFS-6847: - We still need to handle snapshot correctly. Also it may be better to integrate the set/getStoragePolicy methods into INode.java. Will update the patch later. Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096177#comment-14096177 ] Larry McCay commented on HDFS-6134: --- And that is ensured by file permissions on the keytab? On Wed, Aug 13, 2014 at 1:14 PM, Alejandro Abdelnur (JIRA) j...@apache.org Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()
[ https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096203#comment-14096203 ] Hadoop QA commented on HDFS-6841: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661506/HDFS-6841-002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7628//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7628//console This message is automatically generated. Use Time.monotonicNow() wherever applicable instead of Time.now() - Key: HDFS-6841 URL: https://issues.apache.org/jira/browse/HDFS-6841 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch {{Time.now()}} used in many places to calculate elapsed time. This should be replaced with {{Time.monotonicNow()}} to avoid effect of System time changes on elapsed time calculations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096214#comment-14096214 ] Alejandro Abdelnur commented on HDFS-6134: -- if httpfs and NN or DNs run in the same box, yes. however, in a prod environment that would not commonly be the case. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Unit testing for out of order writes
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Status: Open (was: Patch Available) Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6850) Unit testing for out of order writes
Zhe Zhang created HDFS-6850: --- Summary: Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Unit testing for out of order writes
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Status: Patch Available (was: Open) Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Unit testing for out of order writes
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Attachment: HDFS-6850.patch Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Unit testing for out of order writes
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Status: Patch Available (was: Open) Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()
[ https://issues.apache.org/jira/browse/HDFS-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096254#comment-14096254 ] Xiaoyu Yao commented on HDFS-6848: -- Note: the only caller of DataStorage#format() is the synchronized method addStorageLocations. So we should be fine without any changes. If this going to be called by other non-synchronized method in future, it is better to call the synchronized method setDatanodeUuid() instead of the direct assignment as Ted reported above in DataStorage#format(). {code} private synchronized void addStorageLocations(DataNode datanode,...) { format(sd, nsInfo, datanode.getDatanodeUuid()); } {code} Lack of synchronization on access to datanodeUuid in DataStorage#format() -- Key: HDFS-6848 URL: https://issues.apache.org/jira/browse/HDFS-6848 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} this.datanodeUuid = datanodeUuid; {code} The above assignment should be done holding lock DataStorage.this - as is done in two other places. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Attachment: HDFS-6847.001.patch Update the patch. Main changes: # Update the patch to support snapshot path # Move getStoragePolicyID into INode.java. # Fix bugs and add unit tests Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side
[ https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096269#comment-14096269 ] Todd Lipcon commented on HDFS-6561: --- Hey James. I looked over this patch and wrote/ran some performance tests. At first, I was a little concerned that the way you consolidated the code for bulk_crc32 would slow things down. In particular, there's now a new branch per chunk to determine whether to store or verify the CRC. I was worried that, if this branch were mispredicted, we'd pay an extra 15-20 cycles for every 512-byte chunk (which at 0.13 cycles/byte only takes ~66 cycles). That would represent a close to 20% performance regression. So, I wrote a simple test which approximates exactly the HDFS usage of these APIs -- ie 512 byte chunks and a reasonable amount of data. In this test, I found that the above concern was unwarranted - probably because the branch prediction unit does a very good job with the simple branch pattern here. I'll attach a version of your patch which includes the benchmark that I wrote in case anyone else wants to run it. Here are my average timings for 512MB of 512-byte-chunked checksums (on my Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz) *Before*: Calculate: 401275us (1.28GB/sec) Verify: 41184us (12.43GB/sec *After*: Calculate: 41808us (12.25GB/sec) Verify: 41604us (12.31GB/sec) These seem to match earlier results you've posted elsewhere - just wanted to confirm on my machine and make sure that the existing verify code path didn't regress due to the new functionality. For ease of review, I also think it makes sense to split this patch up a little further, and make this JIRA only do the changes to the checksumming code to allow for native calculation. The changes to FSOutputSummer, DFSOutputStream, etc, are a bit more complex and probably should be reviewed separately. I took the liberty of removing those chunks from the patch as I was testing it, so I'll upload that here and you can take a look. Given the above, I only reviewed the portion related to checksumming and didn't yet look in detail at the outputsummer, etc, changes. Byte array native checksumming on client side - Key: HDFS-6561 URL: https://issues.apache.org/jira/browse/HDFS-6561 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6850) Unit testing for out of order writes
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reassigned HDFS-6850: Assignee: Zhe Zhang Unit testing for out of order writes Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6850: - Target Version/s: 2.6.0 Affects Version/s: (was: 3.0.0) 2.6.0 Summary: Move NFS out of order write unit tests into TestWrites class (was: Unit testing for out of order writes) Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-4486) Add log category for long-running DFSClient notices
[ https://issues.apache.org/jira/browse/HDFS-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reassigned HDFS-4486: --- Assignee: Zhe Zhang Add log category for long-running DFSClient notices --- Key: HDFS-4486 URL: https://issues.apache.org/jira/browse/HDFS-4486 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Zhe Zhang Priority: Minor There are a number of features in the DFS client which are transparent but can make a fairly big difference for performance -- two in particular are short circuit reads and native checksumming. Because we don't want log spew for clients like hadoop fs -cat we currently log only at DEBUG level when these features are disabled. This makes it difficult to troubleshoot/verify for long-running perf-sensitive clients like HBase. One simple solution is to add a new log category - eg o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could enable at DEBUG level without getting the full debug spew. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096309#comment-14096309 ] Hadoop QA commented on HDFS-6850: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661565/HDFS-6850.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7629//console This message is automatically generated. Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Attachment: HDFS-6850.patch Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Attachment: (was: HDFS-6850.patch) Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6801) Archival Storage: Add a new data migration tool
[ https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6801: -- Attachment: h6801_20140814.patch h6801_20140814.patch: fixes some bugs. Still need to add code to start the dispatcher. Archival Storage: Add a new data migration tool Key: HDFS-6801 URL: https://issues.apache.org/jira/browse/HDFS-6801 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6801_20140813.patch, h6801_20140814.patch The tool is similar to Balancer. It periodic scans the blocks in HDFS and uses path and/or other meta data (e.g. mtime) to determine if a block should be cooled down (i.e. hot = warm, or warm = cold) or warmed up (i.e. cold = warm, or warm = hot). In contrast to Balancer, the migration tool always move replicas to a different storage type. Similar to Balancer, the replicas are moved in a way that the number of racks the block does not decrease. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Status: Open (was: Patch Available) Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096317#comment-14096317 ] Zhe Zhang commented on HDFS-6850: - The first patch file wasn't correctly generated. Resubmitting now. Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-6850: Target Version/s: 3.0.0 (was: 2.6.0) Affects Version/s: (was: 2.6.0) 3.0.0 Status: Patch Available (was: Open) Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096346#comment-14096346 ] Daryn Sharp commented on HDFS-6826: --- Haha, synchronous discussion - that made my day. Yes, I'll contact you offline. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6546) Add non-superuser capability to get the encryption zone for a specific path
[ https://issues.apache.org/jira/browse/HDFS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096350#comment-14096350 ] Colin Patrick McCabe commented on HDFS-6546: Nice idea. Returning just a path seems a bit inflexible. Can we also return an encryption zone id of sorts? I think the inode ID of the EZ would work pretty nicely (based on some offline discussion with Andrew). That way we can also add more stuff if we want later... we're not locked into just what fields Path has. Also, I noticed a few places in the test where you inverted expected and provided. The expected thing should come first in Assert.assert, so if the test fails, you don't get confusing error messages... One last thing... I modified the test slightly to call this API on something in a snapshot, and it failed with this exception: {code} Running org.apache.hadoop.hdfs.TestEncryptionZones Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.539 sec FAILURE! - in org.apache.hadoop.hdfs.TestEncryptionZones testGetEZRootAsNonSuperUser(org.apache.hadoop.hdfs.TestEncryptionZones) Time elapsed: 3.876 sec ERROR! org.apache.hadoop.ipc.RemoteException: Modification on a read-only snapshot is disallowed at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath4Write(FSDirectory.java:3071) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath4Write(FSDirectory.java:1490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEZRootForPath(FSNamesystem.java:8598) {code} This should work on snapshotted files... probably a good idea to add a unit test for that. Similarly, we should test what happens when both the file and the EZ have been deleted, but are still in a snapshot. Thanks Add non-superuser capability to get the encryption zone for a specific path --- Key: HDFS-6546 URL: https://issues.apache.org/jira/browse/HDFS-6546 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6546.001.patch Need to add protocol, api, and CLI that allows a non super user to ask whether a path is part of an EZ, and if so, which one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side
[ https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096361#comment-14096361 ] Colin Patrick McCabe commented on HDFS-6561: Great idea. The {{hdfs-6561-just-hadoop-changes.txt}} patch needs to be rebased... it didn't apply cleanly for me against trunk. {code} JNIEXPORT void JNICALL Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSums (JNIEnv *env, jclass clazz, jint bytes_per_checksum, jint j_crc_type, jobject j_sums, jint sums_offset, jobject j_data, jint data_offset, jint data_len, jstring j_filename, jlong base_pos, jboolean verify) {code} Later, you use an if(likely) on the verify boolean. Rather than do this, why not just have a utility function that both nativeComputeChunkedSumsByteArray and nativeVerifyChunkedSums call? {code} -#include stdint.h +#include stdbool.h {code} Please, no. There are a lot of older C compilers floating around out there that will choke on this. Plus we still need {{stdint.h}}, since we're using {{uint32_t}}, etc. etc. I don't think the C99 _Bool stuff adds a lot of type safety anyway, since any non-struct type can implicitly be converted to a bool, and a bool can be used as in int in many contexts. Byte array native checksumming on client side - Key: HDFS-6561 URL: https://issues.apache.org/jira/browse/HDFS-6561 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch, hdfs-6561-just-hadoop-changes.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096379#comment-14096379 ] Hadoop QA commented on HDFS-6850: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661589/HDFS-6850.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7630//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7630//console This message is automatically generated. Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side
[ https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096378#comment-14096378 ] Hadoop QA commented on HDFS-6561: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661574/hdfs-6561-just-hadoop-changes.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7631//console This message is automatically generated. Byte array native checksumming on client side - Key: HDFS-6561 URL: https://issues.apache.org/jira/browse/HDFS-6561 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch, hdfs-6561-just-hadoop-changes.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096405#comment-14096405 ] Andrew Wang commented on HDFS-6783: --- +1 from me too, thanks for working on this Yi and Colin. Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6851) Flush EncryptionZoneWithId and add an id field to EncryptionZone
Charles Lamb created HDFS-6851: -- Summary: Flush EncryptionZoneWithId and add an id field to EncryptionZone Key: HDFS-6851 URL: https://issues.apache.org/jira/browse/HDFS-6851 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb EncryptionZoneWithId can be flushed by moving the id field up to EncryptionZone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6634: --- Attachment: HDFS-6634.4.patch Thanks for the comments, Andrew. Updated patch. inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6634.2.patch, HDFS-6634.3.patch, HDFS-6634.4.patch, HDFS-6634.patch, inotify-design.2.pdf, inotify-design.pdf, inotify-intro.2.pdf, inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()
[ https://issues.apache.org/jira/browse/HDFS-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096431#comment-14096431 ] Ted Yu commented on HDFS-6848: -- bq. it is better to call the synchronized method setDatanodeUuid() That should be good. Lack of synchronization on access to datanodeUuid in DataStorage#format() -- Key: HDFS-6848 URL: https://issues.apache.org/jira/browse/HDFS-6848 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} this.datanodeUuid = datanodeUuid; {code} The above assignment should be done holding lock DataStorage.this - as is done in two other places. -- This message was sent by Atlassian JIRA (v6.2#6252)