[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-6826: - Attachment: HDFSPluggableAuthorizationProposal-v2.pdf updated proposal removing the refresh() method and adding the createPermissionChecker() method to the plugin interface. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration
[ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6376: Assignee: Dave Marion Distcp data between two HA clusters requires another configuration -- Key: HDFS-6376 URL: https://issues.apache.org/jira/browse/HDFS-6376 Project: Hadoop HDFS Issue Type: Bug Components: datanode, federation, hdfs-client Affects Versions: 2.3.0, 2.4.0 Environment: Hadoop 2.3.0 Reporter: Dave Marion Assignee: Dave Marion Fix For: 3.0.0 Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, HDFS-6376-patch-1.patch User has to create a third set of configuration files for distcp when transferring data between two HA clusters. Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml and hdfs-site.xml for the client to resolve the location of both active namenodes. If you do, then the datanodes from cluster A may join cluster B. I can not find a configuration option that tells the datanodes to federate blocks for only one of the clusters in the configuration. [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6800: --- Attachment: HDFS-6800.2.patch Updated the patch to delete the trash directory if the previous directory exists. Determine how Datanode layout changes should interact with rolling upgrade -- Key: HDFS-6800 URL: https://issues.apache.org/jira/browse/HDFS-6800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: James Thomas Attachments: HDFS-6800.2.patch, HDFS-6800.patch We need to handle attempts to rolling-upgrade the DataNode to a new storage directory layout. One approach is to disallow such upgrades. If we choose this approach, we should make sure that the system administrator gets a helpful error message and a clean failure when trying to use rolling upgrade to a version that doesn't support it. Based on the compatibility guarantees described in HDFS-5535, this would mean that *any* future DataNode layout changes would require a major version upgrade. Another approach would be to support rolling upgrade from an old DN storage layout to a new layout. This approach requires us to change our documentation to explain to users that they should supply the {{\-rollback}} command on the command-line when re-starting the DataNodes during rolling rollback. Currently the documentation just says to restart the DataNode normally. Another issue here is that the DataNode's usage message describes rollback options that no longer exist. The help text says that the DN supports {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6800: --- Attachment: HDFS-6800.3.patch Determine how Datanode layout changes should interact with rolling upgrade -- Key: HDFS-6800 URL: https://issues.apache.org/jira/browse/HDFS-6800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: James Thomas Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch We need to handle attempts to rolling-upgrade the DataNode to a new storage directory layout. One approach is to disallow such upgrades. If we choose this approach, we should make sure that the system administrator gets a helpful error message and a clean failure when trying to use rolling upgrade to a version that doesn't support it. Based on the compatibility guarantees described in HDFS-5535, this would mean that *any* future DataNode layout changes would require a major version upgrade. Another approach would be to support rolling upgrade from an old DN storage layout to a new layout. This approach requires us to change our documentation to explain to users that they should supply the {{\-rollback}} command on the command-line when re-starting the DataNodes during rolling rollback. Currently the documentation just says to restart the DataNode normally. Another issue here is that the DataNode's usage message describes rollback options that no longer exist. The help text says that the DN supports {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098270#comment-14098270 ] Yongjun Zhang commented on HDFS-6833: - HI Shinichi, Thanks for finding this issue and the patch work. The reason for blocks is missing in memory is that the block is already removed from the memory map, and deletion of the physical block is to be done by FsDatasetAsyncDiskService, which is asynchrous operation. Though the DirecotryScanner is only scheduled to run every 6 hours (by default), the FsDatasetAsyncDiskService's block deletion could be so delayed that DirectoryScanner can see some blocks already removed from memory but still exist on disk. I worked out patch that can possibly helps the slowness of the disk removal, see HDFS-6788. However, I think this jira should help from a different perspective. Overall the latest patch looks good to me. I have some comments here: 0. suggest to have a version number when you upload new patch. 1. suggest to change {{isDirectoryScanner()}} to {{isDirecotryScannerInited}}. 2, using ListLong for deletingBlocks may not be efficient since you do search in {{public void removeDeletedBlocks(String bpid, ListLong blockIds)}} which means sequential search. You might consider using HashSet. 3. DirectoryScanner.java. Not related to your change, but I saw it when looking at your change: {code} while (m memReport.length d blockpoolReport.length) { Block memBlock = memReport[Math.min(m, memReport.length - 1)]; ScanInfo info = blockpoolReport[Math.min( d, blockpoolReport.length - 1)]; {code} Math.min(m, memReport.length - 1) is guaranteed to be m and Math.min(d, blockpoolReport.length - 1) is guaranteed to be d, the code can be simplified to not call Math.min. 4. DirecotryScanner.java {code} while (d blockpoolReport.length) { if (!dataset.isDeletingBlock(bpid, blockpoolReport[d].getBlockId())) { statsRecord.missingMemoryBlocks++; addDifference(diffRecord, statsRecord, blockpoolReport[d++]); } else { deletingBlockIds.add(blockpoolReport[d].getBlockId()); d++; } } {code} the d++ logic can be extracted out to be shared by both branches. Thanks. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098295#comment-14098295 ] Jitendra Nath Pandey commented on HDFS-6826: In FsPermissionChecker interface, I don't think we should expose FSDirectory. We can attempt to either remove FSDirectory from FsPermissionChecker, or another choice could be to keep FsPermissionChecker class and let it internally use the plugin implementation for permission checks. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5687) Problem in accessing NN JSP page
[ https://issues.apache.org/jira/browse/HDFS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098319#comment-14098319 ] Hadoop QA commented on HDFS-5687: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620319/HDFS-5687-0001.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7642//console This message is automatically generated. Problem in accessing NN JSP page Key: HDFS-5687 URL: https://issues.apache.org/jira/browse/HDFS-5687 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: sathish Assignee: sathish Priority: Minor Fix For: 2.6.0 Attachments: HDFS-5687-0001.patch In NN UI page After clicking the browse File System page,from that page,if you click GO Back TO DFS HOME ICon it is not accessing the dfshealth.jsp page NN http URL is http://nnaddr///nninfoaddr/dfshealth.jsp,it is coming like this,due to this i think it is not browsing that page It should be http://nninfoaddr/dfshealth.jsp/ like this -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6300) Allows to run multiple balancer simultaneously
[ https://issues.apache.org/jira/browse/HDFS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098321#comment-14098321 ] Hadoop QA commented on HDFS-6300: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642451/HDFS-6300.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7643//console This message is automatically generated. Allows to run multiple balancer simultaneously -- Key: HDFS-6300 URL: https://issues.apache.org/jira/browse/HDFS-6300 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Rakesh R Assignee: Rakesh R Fix For: 2.6.0 Attachments: HDFS-6300.patch Javadoc of Balancer.java says, it will not allow to run second balancer if the first one is in progress. But I've noticed multiple can run together and balancer.id implementation is not safe guarding. {code} * liAnother balancer is running. Exiting... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3907) Allow multiple users for local block readers
[ https://issues.apache.org/jira/browse/HDFS-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098375#comment-14098375 ] Hadoop QA commented on HDFS-3907: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544410/hdfs-3907.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7645//console This message is automatically generated. Allow multiple users for local block readers Key: HDFS-3907 URL: https://issues.apache.org/jira/browse/HDFS-3907 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.6.0 Attachments: hdfs-3907.txt The {{dfs.block.local-path-access.user}} config added in HDFS-2246 only supports a single user, however as long as blocks are group readable by more than one user the feature could be used by multiple users, to support this we just need to allow both to be configured. In practice this allows us to also support HBase where the client (RS) runs as the hbase system user and the DN runs as hdfs system user. I think this should work secure as well since we're not using impersonation in the HBase case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098430#comment-14098430 ] Hadoop QA commented on HDFS-6800: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662015/HDFS-6800.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.TestHDFSServerPorts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7641//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7641//console This message is automatically generated. Determine how Datanode layout changes should interact with rolling upgrade -- Key: HDFS-6800 URL: https://issues.apache.org/jira/browse/HDFS-6800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: James Thomas Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch We need to handle attempts to rolling-upgrade the DataNode to a new storage directory layout. One approach is to disallow such upgrades. If we choose this approach, we should make sure that the system administrator gets a helpful error message and a clean failure when trying to use rolling upgrade to a version that doesn't support it. Based on the compatibility guarantees described in HDFS-5535, this would mean that *any* future DataNode layout changes would require a major version upgrade. Another approach would be to support rolling upgrade from an old DN storage layout to a new layout. This approach requires us to change our documentation to explain to users that they should supply the {{\-rollback}} command on the command-line when re-starting the DataNodes during rolling rollback. Currently the documentation just says to restart the DataNode normally. Another issue here is that the DataNode's usage message describes rollback options that no longer exist. The help text says that the DN supports {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098443#comment-14098443 ] Hudson commented on HDFS-6850: -- FAILURE: Integrated in Hadoop-Yarn-trunk #647 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/647/]) HDFS-6850. Move NFS out of order write unit tests into TestWrites class. Contributed by Zhe Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Fix For: 2.6.0 Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098520#comment-14098520 ] Hadoop QA commented on HDFS-6800: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662015/HDFS-6800.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController org.apache.hadoop.hdfs.server.datanode.TestBPOfferService The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.TestHDFSServerPorts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7644//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7644//console This message is automatically generated. Determine how Datanode layout changes should interact with rolling upgrade -- Key: HDFS-6800 URL: https://issues.apache.org/jira/browse/HDFS-6800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: James Thomas Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch We need to handle attempts to rolling-upgrade the DataNode to a new storage directory layout. One approach is to disallow such upgrades. If we choose this approach, we should make sure that the system administrator gets a helpful error message and a clean failure when trying to use rolling upgrade to a version that doesn't support it. Based on the compatibility guarantees described in HDFS-5535, this would mean that *any* future DataNode layout changes would require a major version upgrade. Another approach would be to support rolling upgrade from an old DN storage layout to a new layout. This approach requires us to change our documentation to explain to users that they should supply the {{\-rollback}} command on the command-line when re-starting the DataNodes during rolling rollback. Currently the documentation just says to restart the DataNode normally. Another issue here is that the DataNode's usage message describes rollback options that no longer exist. The help text says that the DN supports {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade
[ https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6569: - Assignee: Brandon Li (was: Kihwal Lee) OOB message can't be sent to the client when DataNode shuts down for upgrade Key: HDFS-6569 URL: https://issues.apache.org/jira/browse/HDFS-6569 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, test-hdfs-6569.patch The socket is closed too early before the OOB message can be sent to client, which causes the write pipeline failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade
[ https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098582#comment-14098582 ] Kihwal Lee commented on HDFS-6569: -- The patch looks good in general. There are a couple of things that can be improved though. - Serial transmission of OOB. One bad client may block this and prevent the message from being sent to the rest of good clients. Unless a new thread is created (during shutdown!) to send an OOB ack asynchrously, the blocking ack.readFields() call needs to be changed in order to delegate the message transmission to the responder thread. I believe this is beyond the scope of this jira. I suggest filing a new jira for improving this. - Shutdown OOB can be sent twice. This does not affect the correctness, but DN log can become a bit messy. We can make it skip the OOB sending on interrupt, if it was already sent. If you want to address this in a separate jira, that is fine, since it is a minor issue. OOB message can't be sent to the client when DataNode shuts down for upgrade Key: HDFS-6569 URL: https://issues.apache.org/jira/browse/HDFS-6569 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, test-hdfs-6569.patch The socket is closed too early before the OOB message can be sent to client, which causes the write pipeline failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098583#comment-14098583 ] Kihwal Lee commented on HDFS-6825: -- I will take a look at it soon. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098606#comment-14098606 ] Hudson commented on HDFS-6850: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1864 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1864/]) HDFS-6850. Move NFS out of order write unit tests into TestWrites class. Contributed by Zhe Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Fix For: 2.6.0 Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098657#comment-14098657 ] Kihwal Lee commented on HDFS-6825: -- Could we also check that this works with a recursive delete on the containing folder of the open file? I assume the change in {{isFileDeleted()}} is for this. I believe the recursive check is not necessary. When a tree is deleted, everything under it is recursively processed while holding FSNamesystem and FSDirectory write lock. If it does not belong to any snapshot, its parent and block field will be cleared. If in a snapshot, it will be marked as deleted. The only thing that is not cleared while in the lock and causing this issue is the block collection field of BlockInfo. So {{isFileDeleted()}} does not need to walk up the tree. The rest of the patch looks good. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098640#comment-14098640 ] Hudson commented on HDFS-6850: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1838 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1838/]) HDFS-6850. Move NFS out of order write unit tests into TestWrites class. Contributed by Zhe Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Fix For: 2.6.0 Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098659#comment-14098659 ] Yongjun Zhang commented on HDFS-6825: - Thanks [~kihwal]! and thanks [~andrew.wang] and [~atm] for the earlier review! Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-6833: - Attachment: HDFS-6833-6.patch Hi [~yzhangal], Thank you for your review and comments. I attach a renew patch which reflected your comments. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098682#comment-14098682 ] Yongjun Zhang commented on HDFS-6825: - HI [~kihwal], Thanks a lot for the review, we were doing the last update at the same time so I just saw your review comments. The change is {{isFileDeleted}} is to handle recursive deletion. If we remove the change in this method, we can see the test I added fail. Say, for a path /a/b/c/file, if we do {{fs.delete(/a/b, true)}}, what I observed is different than what you stated: it only removes b from a's children when holding the write lock (and delayed other removal to later), thus the {{isFileDeleted}} returned false on /a/b/c/file. I just rerun to collect a log for your reference. This exception happens when the test restart NN to see if the editlog is corrupted or not. With the fix I introduced in {{isFileDeleted}}, it solves this problem: {code} Running org.apache.hadoop.hdfs.server.namenode.TestDeleteRace Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.297 sec FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestDeleteRace testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(org.apache.hadoop.hdfs.server.namenode.TestDeleteRace) Time elapsed: 7.101 sec ERROR! java.io.FileNotFoundException: File does not exist: /testdir/testdir1/test-file at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:412) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:820) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:678) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:972) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:715) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:533) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:756) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:740) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1425) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1696) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:1651) at org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRace(TestDeleteRace.java:317) at org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(TestDeleteRace.java:338) {code} Thanks. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098685#comment-14098685 ] Yongjun Zhang commented on HDFS-6776: - I ran the failed tests locally several times and don't see them fail. Uploading the same patch and try again. Thanks. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
[jira] [Updated] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6776: Attachment: HDFS-6776.004.patch distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at
[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class
[ https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098708#comment-14098708 ] Zhe Zhang commented on HDFS-6850: - Thanks [~atm] and [~brandonli] for reviewing the patch! Move NFS out of order write unit tests into TestWrites class Key: HDFS-6850 URL: https://issues.apache.org/jira/browse/HDFS-6850 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 3.0.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Priority: Minor Fix For: 2.6.0 Attachments: HDFS-6850.patch Expanding TestWrites class to include the out of order writing scenario. I think it is logical to merge the OOO scenario in the TestWrites class instead of having a separate TestOutOfOrderWrite class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098711#comment-14098711 ] Alejandro Abdelnur commented on HDFS-6776: -- {{NullTokenMsgHeader}} constant name should be all capitals. I'm not sure I like looking for a string occurrence in the IOException message to detect the issue. I thought WebHdfs was recreating exceptions on the client side but it doesn't seem the case for these DT calls. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native
[jira] [Assigned] (HDFS-6855) Add a different end-to-end non-manual NFS test to replace TestOutOfOrderWrite
[ https://issues.apache.org/jira/browse/HDFS-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reassigned HDFS-6855: --- Assignee: Zhe Zhang Add a different end-to-end non-manual NFS test to replace TestOutOfOrderWrite - Key: HDFS-6855 URL: https://issues.apache.org/jira/browse/HDFS-6855 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Brandon Li Assignee: Zhe Zhang TestOutOfOrderWrite is an end-to-end test with a TCP client. However, it's a manual test and out-of-order write is covered by new added test in HDFS-6850. This JIRA is to track the effort of adding a new end-to-end test with more test cases to replace TestOutOfOrderWrite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098722#comment-14098722 ] Yongjun Zhang commented on HDFS-6776: - Thanks a lot [~tucu00]! I will address your comments in next rev. I'm currently using message parsing to detect null token returned from server due to lack of right exception. There is an advantage of this fix: we only need to patch secure cluster side, and it will work. To introduce a new exception means compatibility issue, if we decide to do so, we can file a follow-up jira for release 3.0? Thanks. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at
[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-6826: - Attachment: HDFS-6826v3.patch Jitendra, attaching v3 patch, in this patch i've moving the checkpermission logic to the plugin and the FsPemissionChecker does delegate to the plugin. Still the FSDirectory is exposed in the API. Between v2 and v3 I prefer v2. Still I would argue we shouldn't allow replacing the permission checker logic, to ensure consistent check behavior. I don't have a use case for having a different permission check logic, do you? If nothing in sight at the moment, then we can table that till is needed. Thoughts? Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option
[ https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098779#comment-14098779 ] Yongjun Zhang commented on HDFS-4257: - HI [~szetszwo], this issue is heating up now, I wonder if you will have time to work on this soon? if not, I wonder if I can pick up from where you are? thanks a lot. The ReplaceDatanodeOnFailure policies could have a forgiving option --- Key: HDFS-4257 URL: https://issues.apache.org/jira/browse/HDFS-4257 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Harsh J Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h4257_20140325.patch, h4257_20140325b.patch, h4257_20140326.patch Similar question has previously come over HDFS-3091 and friends, but the essential problem is: Why can't I write to my cluster of 3 nodes, when I just have 1 node available at a point in time.. The policies cover the 4 options, with {{Default}} being default: {{Disable}} - Disables the whole replacement concept by throwing out an error (at the server) or acts as {{Never}} at the client. {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in many cases). {{Default}} - Replace based on a few conditions, but whose minimum never touches 1. We always fail if only one DN remains and none others can be added. {{Always}} - Replace no matter what. Fail if can't replace. Would it not make sense to have an option similar to Always/Default, where despite _trying_, if it isn't possible to have 1 DN in the pipeline, do not fail. I think that is what the former write behavior was, and what fit with the minimum replication factor allowed value. Why is it grossly wrong to pass a write from a client for a block with just 1 remaining replica in the pipeline (the minimum of 1 grows with the replication factor demanded from the write), when replication is taken care of immediately afterwards? How often have we seen missing blocks arise out of allowing this + facing a big rack(s) failure or so? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098781#comment-14098781 ] Tsz Wo Nicholas Sze commented on HDFS-6847: --- Thanks for clarifying it. Patch looks good. Some minor comments: - In FSDirectory.unprotectedSetStoragePolicy, it should throw an exception for non-file, non-directory inodes. - FSNamesystem.setStoragePolicy support both files and directories but the javadoc change seems suggesting that src must be a directory. Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories
[ https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6847: Attachment: HDFS-6847.002.patch Thanks for the review, Nicholas! Update the patch to address your comments. Archival Storage: Support storage policy on directories --- Key: HDFS-6847 URL: https://issues.apache.org/jira/browse/HDFS-6847 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch, HDFS-6847.002.patch This jira plans to add storage policy support on directory, i.e., users can set/get storage policy for not only files but also directories. We allow users to set storage policies for nested directories/files. For a specific file/directory, its storage policy then should be its own storage policy, if it is specified, or the storage policy specified on its nearest ancestral directory. E.g., for a path /foo/bar/baz, if two different policies are set on foo and bar (p1 for foo and p2 for bar), the storage policies for baz, bar, and foo should be p2, p2, and p1, respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098943#comment-14098943 ] Kihwal Lee commented on HDFS-6825: -- [~yzhangal] I've verified that you are correct. The file deletion check in snapshot case was suggested in HDFS-6527 and I thought that was good enough. Apparently not. If the full path is re-resolved after this, that can detect the deletion, but in {{commitBlockSynchronization()}}, that seems to happen too late. Also for all other uses of {{isFileClosed()}}, walking up the tree is the only sure way to tell whether the file is deleted. So your fix is correct. [~daryn] Watch out for this in your fine-grained directory locking. +1 for the patch. Good work! Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4486) Add log category for long-running DFSClient notices
[ https://issues.apache.org/jira/browse/HDFS-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098950#comment-14098950 ] Zhe Zhang commented on HDFS-4486: - My current plan is to follow this page to generate an extended logger class: http://logging.apache.org/log4j/2.x/manual/customloglevels.html. I haven't seen these kind of extended logger classes in the Hadoop project though. So please let me know if you think there's a better approach. Add log category for long-running DFSClient notices --- Key: HDFS-4486 URL: https://issues.apache.org/jira/browse/HDFS-4486 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Zhe Zhang Priority: Minor There are a number of features in the DFS client which are transparent but can make a fairly big difference for performance -- two in particular are short circuit reads and native checksumming. Because we don't want log spew for clients like hadoop fs -cat we currently log only at DEBUG level when these features are disabled. This makes it difficult to troubleshoot/verify for long-running perf-sensitive clients like HBase. One simple solution is to add a new log category - eg o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could enable at DEBUG level without getting the full debug spew. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reassigned HDFS-5135: --- Assignee: Zhe Zhang Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Zhe Zhang Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098960#comment-14098960 ] Zhe Zhang commented on HDFS-5135: - Hi Brandon, I'm a little confused by the description of this Jira. As I understand it, many test classes, including TestReaddir and TestWrites, are already unit tests not requiring manual startup of nfs3 and portmap. Is the purpose of this Jira to convert all remaining manual tests (such as TestUdpServer) to unit tests? Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Zhe Zhang Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Attachment: HDFS-6663.patch Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Status: Patch Available (was: Open) Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099018#comment-14099018 ] Chen He commented on HDFS-6663: --- it can also shows how many expected, live, corrupted, decommission replicas for a given blockId, also, if a replica is corrupted, it will show the datanode with corruption reason. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099052#comment-14099052 ] Hadoop QA commented on HDFS-6833: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662084/HDFS-6833-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.TestHDFSServerPorts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7646//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7646//console This message is automatically generated. DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099078#comment-14099078 ] Hadoop QA commented on HDFS-6776: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662086/HDFS-6776.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSServerPorts org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7647//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7647//console This message is automatically generated. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099243#comment-14099243 ] Sanjay Radia commented on HDFS-6134: We have made very good progress over the last few days. Thanks for taking the time for the offline technical discussions. Below is a summary of the concerns I have raised previously in this Jira. # Fix distcp and cp to *automatically* deal with EZ using /r/r internally. Initially we need to support only row 1 and row 4 in the table I attached in Hadoop-10919 # Fix Webhdfs to use KMS delegation tokens so that webhdfs can be used with transparent encryption without giving user hdfs KMS proxy permission (and as a result to admins). Rest is a key protocol for HDFS and for many Hadoop use cases, an Admin should not have access to the keys of encrypted files. # Further work on specifying what HAR should do (I have listed some use cases and proposed solutions ), and then follow it up with a fix to har. # Some work on understanding availability and scalability on KMS for medium to large clusters. Perhaps we need to explore getting the keys ahead of time when a job is submitted. Lets complete Items 1 and 2 promptly. Before we publish transparent encryption in a 2.x release for pubic consumption, let us at least complete item 1 (ie distcp and cp) and the flag to turn this feature on/of. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099246#comment-14099246 ] Brandon Li commented on HDFS-5135: -- Most of the current unit NFS tests are not end-to-end tests, which means they directly invoke the internal methods like what's in TestReaddir. In this way, some funcation/feature can't be covered. For example, we can't validate the response format. This has to be validated by mounting the export and doing manual test on Linux. We found quite a few response format problems in the past, in a painful way, because we don't have enough end to end tests. TestOutOfOrderWrite is an end-to-end test but it doesn't provide a framework or class to be used by other tests. Specifcally, we need a few things. We may want to create more JIRAs to split the work: 1. utilities to package every NFS/mountd requests 2. utilityes to parse every NFS/mountd response 3. a test UDP client and TCP client, which can deliver request to NFS and get response. Once we have these utilities, we can create tests easily. For example, one could easily write some tests like: 1. send create request 2. assert response status is OK 3. send same create request 4. assert response status is not OK ... ... Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Zhe Zhang Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099311#comment-14099311 ] Sanjay Radia commented on HDFS-6134: Alejandro. Wrt to the subtle difference between webhfs vs httpfs, can an admin grab the EDEKs and raw files and then log into the httpfs machine become user httpfs and then trick the KMS to decrypt the keys because httpfs has proxy setting? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099318#comment-14099318 ] Yongjun Zhang commented on HDFS-6825: - HI [~kihwal], Thanks a lot for verifying and confirming, really appreciate it! Thanks [~andrew.wang] again for the comment about checking out recursive deletion, the process of addressing this comment led to this more complete solution than previous revisions. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6856) Send an OOB ack asynchronously
Brandon Li created HDFS-6856: Summary: Send an OOB ack asynchronously Key: HDFS-6856 URL: https://issues.apache.org/jira/browse/HDFS-6856 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Brandon Li Assignee: Brandon Li As [~kihwal] pointed out in HDFS-6569,: One bad client may block this and prevent the message from being sent to the rest of good clients. Unless a new thread is created (during shutdown!) to send an OOB ack asynchronously, the blocking ack.readFields() call needs to be changed in order to delegate the message transmission to the responder thread. This JIRA is to track the effort of sending OOB ack asynchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade
[ https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099355#comment-14099355 ] Brandon Li commented on HDFS-6569: -- Thank you, Kihwal, for the review. I've created HDFS-6856 to track the effort of sending OOB ack asynchronously. Uploaded a new patch to not send OOB twice. OOB message can't be sent to the client when DataNode shuts down for upgrade Key: HDFS-6569 URL: https://issues.apache.org/jira/browse/HDFS-6569 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, HDFS-6569.003.patch, test-hdfs-6569.patch The socket is closed too early before the OOB message can be sent to client, which causes the write pipeline failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade
[ https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6569: - Attachment: HDFS-6569.003.patch OOB message can't be sent to the client when DataNode shuts down for upgrade Key: HDFS-6569 URL: https://issues.apache.org/jira/browse/HDFS-6569 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.4.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, HDFS-6569.003.patch, test-hdfs-6569.patch The socket is closed too early before the OOB message can be sent to client, which causes the write pipeline failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099357#comment-14099357 ] Selvamohan Neethiraj commented on HDFS-6826: Alejandro, The use case for externalizing the authorization: If a enterprise keeps their metadata details such as what is confidential in a separate system and provide access control based on the metadata, it is important to have a plug-able authorization module, which can use the metadata from external system and provide authorization to users/groups based on their own logic. I do not expect every organization to have a custom/plug-able authorization. But, this would allow security vendors and system integrators to expand security scope for hdfs. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099415#comment-14099415 ] Hadoop QA commented on HDFS-6663: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662134/HDFS-6663.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSServerPorts org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7648//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7648//console This message is automatically generated. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099441#comment-14099441 ] Shinichi Yamashita commented on HDFS-6833: -- The following tests succeeded in my environment. {quote} org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer {quote} DirectoryScanner should not register a deleting block with memory of DataNode - Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Attachment: HDFS-6663-2.patch Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099453#comment-14099453 ] Chen He commented on HDFS-6663: --- update new patch that resolve the findbugs problem. The core-test failure is because of HDFS-6694 Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Affects Version/s: 2.5.0 Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099469#comment-14099469 ] Alejandro Abdelnur commented on HDFS-6826: -- [~sneethiraj], got it, it makes sense. Next week I would start working on the v2 patch to get it in proper shape. As a first cut I would prefer to make things pluggable without altering APIs. we can work on refining the APIs once we have the desired functionality. Also, something to keep in mind, this plugin API is meant to be used by somebody with very good understanding of the NameNode guts and expected behavior. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099483#comment-14099483 ] Aaron T. Myers commented on HDFS-6774: -- Hey Eddy, patch looks pretty good to me. A few questions: # The change in {{BlockPoolSlice}} - was that just a separate bug? Or why was that necessary? # I see the code where we remove the replica info from the replica map, but do we not also need to do something similar in the event that the replica is currently referenced in the BlockScanner or DirectoryScanner data structures? It could be that we don't, but I wanted to check with you to see if you've considered this case. Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6774.000.patch, HDFS-6774.001.patch Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)