[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-15 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-6826:
-

Attachment: HDFSPluggableAuthorizationProposal-v2.pdf

updated proposal removing the refresh() method and adding the 
createPermissionChecker() method to the plugin interface.

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, 
 HDFSPluggableAuthorizationProposal-v2.pdf, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-08-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6376:


Assignee: Dave Marion

 Distcp data between two HA clusters requires another configuration
 --

 Key: HDFS-6376
 URL: https://issues.apache.org/jira/browse/HDFS-6376
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, federation, hdfs-client
Affects Versions: 2.3.0, 2.4.0
 Environment: Hadoop 2.3.0
Reporter: Dave Marion
Assignee: Dave Marion
 Fix For: 3.0.0

 Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
 HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, 
 HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, 
 HDFS-6376-patch-1.patch


 User has to create a third set of configuration files for distcp when 
 transferring data between two HA clusters.
 Consider the scenario in [1]. You cannot put all of the required properties 
 in core-site.xml and hdfs-site.xml for the client to resolve the location of 
 both active namenodes. If you do, then the datanodes from cluster A may join 
 cluster B. I can not find a configuration option that tells the datanodes to 
 federate blocks for only one of the clusters in the configuration.
 [1] 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade

2014-08-15 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6800:
---

Attachment: HDFS-6800.2.patch

Updated the patch to delete the trash directory if the previous directory 
exists.

 Determine how Datanode layout changes should interact with rolling upgrade
 --

 Key: HDFS-6800
 URL: https://issues.apache.org/jira/browse/HDFS-6800
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: James Thomas
 Attachments: HDFS-6800.2.patch, HDFS-6800.patch


 We need to handle attempts to rolling-upgrade the DataNode to a new storage 
 directory layout.
 One approach is to disallow such upgrades.  If we choose this approach, we 
 should make sure that the system administrator gets a helpful error message 
 and a clean failure when trying to use rolling upgrade to a version that 
 doesn't support it.  Based on the compatibility guarantees described in 
 HDFS-5535, this would mean that *any* future DataNode layout changes would 
 require a major version upgrade.
 Another approach would be to support rolling upgrade from an old DN storage 
 layout to a new layout.  This approach requires us to change our 
 documentation to explain to users that they should supply the {{\-rollback}} 
 command on the command-line when re-starting the DataNodes during rolling 
 rollback.  Currently the documentation just says to restart the DataNode 
 normally.
 Another issue here is that the DataNode's usage message describes rollback 
 options that no longer exist.  The help text says that the DN supports 
 {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade

2014-08-15 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6800:
---

Attachment: HDFS-6800.3.patch

 Determine how Datanode layout changes should interact with rolling upgrade
 --

 Key: HDFS-6800
 URL: https://issues.apache.org/jira/browse/HDFS-6800
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: James Thomas
 Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch


 We need to handle attempts to rolling-upgrade the DataNode to a new storage 
 directory layout.
 One approach is to disallow such upgrades.  If we choose this approach, we 
 should make sure that the system administrator gets a helpful error message 
 and a clean failure when trying to use rolling upgrade to a version that 
 doesn't support it.  Based on the compatibility guarantees described in 
 HDFS-5535, this would mean that *any* future DataNode layout changes would 
 require a major version upgrade.
 Another approach would be to support rolling upgrade from an old DN storage 
 layout to a new layout.  This approach requires us to change our 
 documentation to explain to users that they should supply the {{\-rollback}} 
 command on the command-line when re-starting the DataNodes during rolling 
 rollback.  Currently the documentation just says to restart the DataNode 
 normally.
 Another issue here is that the DataNode's usage message describes rollback 
 options that no longer exist.  The help text says that the DN supports 
 {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098270#comment-14098270
 ] 

Yongjun Zhang commented on HDFS-6833:
-

HI Shinichi,

Thanks for finding this issue and the patch work.

The reason for blocks is missing in memory is that the block is already 
removed from the memory map, and deletion of the physical block is to be done 
by FsDatasetAsyncDiskService, which is asynchrous operation. Though the 
DirecotryScanner is only scheduled to run every 6 hours (by default), the 
FsDatasetAsyncDiskService's block deletion could be so delayed that 
DirectoryScanner can see some blocks already removed from memory but still 
exist on disk. I worked out patch that can possibly helps the slowness of the 
disk removal, see HDFS-6788. However, I think this jira should help from a 
different perspective.

Overall the latest patch looks good to me. I have some comments here:

0. suggest to have a version number when you upload new patch.

1. suggest to change {{isDirectoryScanner()}} to {{isDirecotryScannerInited}}.

2, using ListLong for deletingBlocks may not be efficient since you do search 
in 
{{public void removeDeletedBlocks(String bpid, ListLong blockIds)}} which 
means sequential search. You might consider using HashSet.

3. DirectoryScanner.java. Not related to your change, but I saw it when looking 
at your change:
{code}
while (m  memReport.length  d  blockpoolReport.length) {
  Block memBlock = memReport[Math.min(m, memReport.length - 1)];
  ScanInfo info = blockpoolReport[Math.min(
  d, blockpoolReport.length - 1)];
{code}

Math.min(m, memReport.length - 1) is guaranteed to be m and 
Math.min(d, blockpoolReport.length - 1) is guaranteed to be d,
the code can be simplified to not call Math.min.

4. DirecotryScanner.java
{code}
while (d  blockpoolReport.length) {
  if (!dataset.isDeletingBlock(bpid, blockpoolReport[d].getBlockId())) {
statsRecord.missingMemoryBlocks++;
addDifference(diffRecord, statsRecord, blockpoolReport[d++]);
  } else {
deletingBlockIds.add(blockpoolReport[d].getBlockId());
d++;
  }
}
{code}
the d++ logic can be extracted out to be shared by both branches.

Thanks.


 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 

[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-15 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098295#comment-14098295
 ] 

Jitendra Nath Pandey commented on HDFS-6826:


  In FsPermissionChecker interface, I don't think we should expose FSDirectory. 
We can attempt to either remove FSDirectory from FsPermissionChecker, or 
another choice could be to keep FsPermissionChecker class and let it internally 
use the plugin implementation for permission checks.


 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, 
 HDFSPluggableAuthorizationProposal-v2.pdf, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5687) Problem in accessing NN JSP page

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098319#comment-14098319
 ] 

Hadoop QA commented on HDFS-5687:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620319/HDFS-5687-0001.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7642//console

This message is automatically generated.

 Problem in accessing NN JSP page
 

 Key: HDFS-5687
 URL: https://issues.apache.org/jira/browse/HDFS-5687
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: sathish
Assignee: sathish
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-5687-0001.patch


 In NN UI page After clicking the browse File System page,from that page,if 
 you click GO Back TO DFS HOME ICon it is not accessing the dfshealth.jsp page
 NN http URL is http://nnaddr///nninfoaddr/dfshealth.jsp,it is coming like 
 this,due to this i think it is not browsing that page
 It should be http://nninfoaddr/dfshealth.jsp/ like this



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6300) Allows to run multiple balancer simultaneously

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098321#comment-14098321
 ] 

Hadoop QA commented on HDFS-6300:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642451/HDFS-6300.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7643//console

This message is automatically generated.

 Allows to run multiple balancer simultaneously
 --

 Key: HDFS-6300
 URL: https://issues.apache.org/jira/browse/HDFS-6300
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 2.6.0

 Attachments: HDFS-6300.patch


 Javadoc of Balancer.java says, it will not allow to run second balancer if 
 the first one is in progress. But I've noticed multiple can run together and 
 balancer.id implementation is not safe guarding.
 {code}
  * liAnother balancer is running. Exiting...
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3907) Allow multiple users for local block readers

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098375#comment-14098375
 ] 

Hadoop QA commented on HDFS-3907:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12544410/hdfs-3907.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7645//console

This message is automatically generated.

 Allow multiple users for local block readers
 

 Key: HDFS-3907
 URL: https://issues.apache.org/jira/browse/HDFS-3907
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.6.0

 Attachments: hdfs-3907.txt


 The {{dfs.block.local-path-access.user}} config added in HDFS-2246 only 
 supports a single user, however as long as blocks are group readable by more 
 than one user the feature could be used by multiple users, to support this we 
 just need to allow both to be configured. In practice this allows us to also 
 support HBase where the client (RS) runs as the hbase system user and the DN 
 runs as hdfs system user. I think this should work secure as well since we're 
 not using impersonation in the HBase case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098430#comment-14098430
 ] 

Hadoop QA commented on HDFS-6800:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662015/HDFS-6800.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
org.apache.hadoop.hdfs.TestHDFSServerPorts

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7641//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7641//console

This message is automatically generated.

 Determine how Datanode layout changes should interact with rolling upgrade
 --

 Key: HDFS-6800
 URL: https://issues.apache.org/jira/browse/HDFS-6800
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: James Thomas
 Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch


 We need to handle attempts to rolling-upgrade the DataNode to a new storage 
 directory layout.
 One approach is to disallow such upgrades.  If we choose this approach, we 
 should make sure that the system administrator gets a helpful error message 
 and a clean failure when trying to use rolling upgrade to a version that 
 doesn't support it.  Based on the compatibility guarantees described in 
 HDFS-5535, this would mean that *any* future DataNode layout changes would 
 require a major version upgrade.
 Another approach would be to support rolling upgrade from an old DN storage 
 layout to a new layout.  This approach requires us to change our 
 documentation to explain to users that they should supply the {{\-rollback}} 
 command on the command-line when re-starting the DataNodes during rolling 
 rollback.  Currently the documentation just says to restart the DataNode 
 normally.
 Another issue here is that the DataNode's usage message describes rollback 
 options that no longer exist.  The help text says that the DN supports 
 {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098443#comment-14098443
 ] 

Hudson commented on HDFS-6850:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #647 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/647/])
HDFS-6850. Move NFS out of order write unit tests into TestWrites class. 
Contributed by Zhe Zhang. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6800) Determine how Datanode layout changes should interact with rolling upgrade

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098520#comment-14098520
 ] 

Hadoop QA commented on HDFS-6800:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662015/HDFS-6800.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController
  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
org.apache.hadoop.hdfs.TestHDFSServerPorts

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7644//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7644//console

This message is automatically generated.

 Determine how Datanode layout changes should interact with rolling upgrade
 --

 Key: HDFS-6800
 URL: https://issues.apache.org/jira/browse/HDFS-6800
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: James Thomas
 Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.patch


 We need to handle attempts to rolling-upgrade the DataNode to a new storage 
 directory layout.
 One approach is to disallow such upgrades.  If we choose this approach, we 
 should make sure that the system administrator gets a helpful error message 
 and a clean failure when trying to use rolling upgrade to a version that 
 doesn't support it.  Based on the compatibility guarantees described in 
 HDFS-5535, this would mean that *any* future DataNode layout changes would 
 require a major version upgrade.
 Another approach would be to support rolling upgrade from an old DN storage 
 layout to a new layout.  This approach requires us to change our 
 documentation to explain to users that they should supply the {{\-rollback}} 
 command on the command-line when re-starting the DataNodes during rolling 
 rollback.  Currently the documentation just says to restart the DataNode 
 normally.
 Another issue here is that the DataNode's usage message describes rollback 
 options that no longer exist.  The help text says that the DN supports 
 {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade

2014-08-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6569:
-

Assignee: Brandon Li  (was: Kihwal Lee)

 OOB message can't be sent to the client when DataNode shuts down for upgrade
 

 Key: HDFS-6569
 URL: https://issues.apache.org/jira/browse/HDFS-6569
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.4.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, 
 test-hdfs-6569.patch


 The socket is closed too early before the OOB message can be sent to client, 
 which causes the write pipeline failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade

2014-08-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098582#comment-14098582
 ] 

Kihwal Lee commented on HDFS-6569:
--

The patch looks good in general. There are a couple of things that can be 
improved though.
- Serial transmission of OOB. One bad client may block this and prevent the 
message from being sent to the rest of good clients.  Unless a new thread is 
created (during shutdown!) to send an OOB ack asynchrously, the blocking 
ack.readFields() call needs to be changed in order to delegate the message 
transmission to the responder thread.  I believe this is beyond the scope of 
this jira. I suggest filing a new jira for improving this.
- Shutdown OOB can be sent twice. This does not affect the correctness, but DN 
log can become a bit messy. We can make it skip the OOB sending on interrupt, 
if it was already sent. If you want to address this in a separate jira, that is 
fine, since it is a minor issue.

 OOB message can't be sent to the client when DataNode shuts down for upgrade
 

 Key: HDFS-6569
 URL: https://issues.apache.org/jira/browse/HDFS-6569
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.4.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, 
 test-hdfs-6569.patch


 The socket is closed too early before the OOB message can be sent to client, 
 which causes the write pipeline failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098583#comment-14098583
 ] 

Kihwal Lee commented on HDFS-6825:
--

I will take a look at it soon.

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098606#comment-14098606
 ] 

Hudson commented on HDFS-6850:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1864 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1864/])
HDFS-6850. Move NFS out of order write unit tests into TestWrites class. 
Contributed by Zhe Zhang. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098657#comment-14098657
 ] 

Kihwal Lee commented on HDFS-6825:
--

 Could we also check that this works with a recursive delete on the containing 
 folder of the open file?
I assume the change in {{isFileDeleted()}} is for this.  I believe the 
recursive check is not necessary. When a tree is deleted, everything under it 
is recursively processed while holding FSNamesystem and FSDirectory write lock. 
If it does not belong to any snapshot, its parent and block field will be 
cleared. If in a snapshot, it will be marked as deleted.  The only thing that 
is not cleared while in the lock and causing this issue is the block collection 
field of BlockInfo.  So {{isFileDeleted()}} does not need to walk up the tree. 
The rest of the patch looks good.

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098640#comment-14098640
 ] 

Hudson commented on HDFS-6850:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1838 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1838/])
HDFS-6850. Move NFS out of order write unit tests into TestWrites class. 
Contributed by Zhe Zhang. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618091)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098659#comment-14098659
 ] 

Yongjun Zhang commented on HDFS-6825:
-

Thanks [~kihwal]! and thanks [~andrew.wang] and [~atm] for the earlier review!
 

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-15 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated HDFS-6833:
-

Attachment: HDFS-6833-6.patch

Hi [~yzhangal],

Thank you for your review and comments.
I attach a renew patch which reflected your comments.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098682#comment-14098682
 ] 

Yongjun Zhang commented on HDFS-6825:
-

HI [~kihwal],

Thanks a lot for the review, we were doing the last update at the same time so 
I just saw your review comments.

The change is {{isFileDeleted}} is to handle recursive deletion. If we remove 
the change in this method, we can see the test I added fail.  Say, for a path 
/a/b/c/file, if we do {{fs.delete(/a/b, true)}}, what I observed is 
different than what you stated: it only removes b from a's children when 
holding the write lock (and delayed other removal to later), thus the 
{{isFileDeleted}} returned false on /a/b/c/file.

I just rerun to collect a log for your reference. This exception happens when 
the test restart NN to see if the editlog is corrupted or not. With the fix I 
introduced in {{isFileDeleted}}, it solves this problem:
{code}
Running org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.297 sec  
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(org.apache.hadoop.hdfs.server.namenode.TestDeleteRace)
  Time elapsed: 7.101 sec   ERROR!
java.io.FileNotFoundException: File does not exist: /testdir/testdir1/test-file
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:412)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:820)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:678)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:972)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:715)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:533)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:589)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:756)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:740)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1425)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1696)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:1651)
at 
org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRace(TestDeleteRace.java:317)
at 
org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(TestDeleteRace.java:338)
{code}

Thanks.


 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 

[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098685#comment-14098685
 ] 

Yongjun Zhang commented on HDFS-6776:
-

I ran the failed tests locally several times and don't see them fail. Uploading 
the same patch and try again.
Thanks.

  

 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

[jira] [Updated] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6776:


Attachment: HDFS-6776.004.patch

 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 

[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098708#comment-14098708
 ] 

Zhe Zhang commented on HDFS-6850:
-

Thanks [~atm] and [~brandonli] for reviewing the patch!

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098711#comment-14098711
 ] 

Alejandro Abdelnur commented on HDFS-6776:
--

{{NullTokenMsgHeader}} constant name should be all capitals.

I'm not sure I like looking for a string occurrence in the IOException message 
to detect the issue. I thought WebHdfs was recreating exceptions on the client 
side but it doesn't seem the case for these DT calls.


 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native 

[jira] [Assigned] (HDFS-6855) Add a different end-to-end non-manual NFS test to replace TestOutOfOrderWrite

2014-08-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned HDFS-6855:
---

Assignee: Zhe Zhang

 Add a different end-to-end non-manual NFS test to replace TestOutOfOrderWrite
 -

 Key: HDFS-6855
 URL: https://issues.apache.org/jira/browse/HDFS-6855
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Brandon Li
Assignee: Zhe Zhang

 TestOutOfOrderWrite is an end-to-end test with a TCP client. However, it's a 
 manual test and out-of-order write is covered by new added test in HDFS-6850.
 This JIRA is to track the effort of adding a new end-to-end test with more 
 test cases to replace TestOutOfOrderWrite.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098722#comment-14098722
 ] 

Yongjun Zhang commented on HDFS-6776:
-

Thanks a lot [~tucu00]! I will address your comments in next rev.

I'm currently using message parsing to detect null token returned from server 
due to lack of right exception. There is an advantage of this fix: we only need 
to patch secure cluster side, and it will work. To introduce a new exception 
means compatibility issue, if we decide to do so, we can file a follow-up jira 
for release 3.0? Thanks.



 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 

[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-15 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-6826:
-

Attachment: HDFS-6826v3.patch

Jitendra, attaching v3 patch, in this patch i've moving the checkpermission 
logic to the plugin and the FsPemissionChecker does delegate to the plugin. 
Still the FSDirectory is exposed in the API.

Between v2 and v3 I prefer v2.

Still I would argue we shouldn't allow replacing the permission checker logic, 
to ensure consistent check behavior. I don't have a use case for having a 
different permission check logic, do you? If nothing in sight at the moment, 
then we can table that till is needed.

Thoughts?

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, 
 HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098779#comment-14098779
 ] 

Yongjun Zhang commented on HDFS-4257:
-

HI [~szetszwo], this issue is heating up now, I wonder if you  will have time 
to work on this soon? if not, I wonder if I can pick up from where you are? 
thanks a lot.


 The ReplaceDatanodeOnFailure policies could have a forgiving option
 ---

 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
 h4257_20140326.patch


 Similar question has previously come over HDFS-3091 and friends, but the 
 essential problem is: Why can't I write to my cluster of 3 nodes, when I 
 just have 1 node available at a point in time..
 The policies cover the 4 options, with {{Default}} being default:
 {{Disable}} - Disables the whole replacement concept by throwing out an 
 error (at the server) or acts as {{Never}} at the client.
 {{Never}} - Never replaces a DN upon pipeline failures (not too desirable in 
 many cases).
 {{Default}} - Replace based on a few conditions, but whose minimum never 
 touches 1. We always fail if only one DN remains and none others can be added.
 {{Always}} - Replace no matter what. Fail if can't replace.
 Would it not make sense to have an option similar to Always/Default, where 
 despite _trying_, if it isn't possible to have  1 DN in the pipeline, do not 
 fail. I think that is what the former write behavior was, and what fit with 
 the minimum replication factor allowed value.
 Why is it grossly wrong to pass a write from a client for a block with just 1 
 remaining replica in the pipeline (the minimum of 1 grows with the 
 replication factor demanded from the write), when replication is taken care 
 of immediately afterwards? How often have we seen missing blocks arise out of 
 allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098781#comment-14098781
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6847:
---

Thanks for clarifying it.  Patch looks good.  Some minor comments:
- In FSDirectory.unprotectedSetStoragePolicy, it should throw an exception for 
non-file, non-directory inodes.
- FSNamesystem.setStoragePolicy support both files and directories but the 
javadoc change seems suggesting that src must be a directory.

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Attachment: HDFS-6847.002.patch

Thanks for the review, Nicholas! Update the patch to address your comments.

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch, 
 HDFS-6847.002.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098943#comment-14098943
 ] 

Kihwal Lee commented on HDFS-6825:
--

[~yzhangal] I've verified that you are correct.  The file deletion check in 
snapshot case was suggested in HDFS-6527 and I thought that was good enough. 
Apparently not. If the full path is re-resolved after this, that can detect the 
deletion, but in {{commitBlockSynchronization()}}, that seems to happen too 
late.  Also for all other uses of {{isFileClosed()}}, walking up the tree is 
the only sure way to tell whether the file is deleted. So your fix is correct.

[~daryn] Watch out for this in your fine-grained directory locking.

+1 for the patch. Good work!

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4486) Add log category for long-running DFSClient notices

2014-08-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098950#comment-14098950
 ] 

Zhe Zhang commented on HDFS-4486:
-

My current plan is to follow this page to generate an extended logger class: 
http://logging.apache.org/log4j/2.x/manual/customloglevels.html. I haven't seen 
these kind of extended logger classes in the Hadoop project though. So please 
let me know if you think there's a better approach.

 Add log category for long-running DFSClient notices
 ---

 Key: HDFS-4486
 URL: https://issues.apache.org/jira/browse/HDFS-4486
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Zhe Zhang
Priority: Minor

 There are a number of features in the DFS client which are transparent but 
 can make a fairly big difference for performance -- two in particular are 
 short circuit reads and native checksumming. Because we don't want log spew 
 for clients like hadoop fs -cat we currently log only at DEBUG level when 
 these features are disabled. This makes it difficult to troubleshoot/verify 
 for long-running perf-sensitive clients like HBase.
 One simple solution is to add a new log category - eg 
 o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could 
 enable at DEBUG level without getting the full debug spew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-5135) Create a test framework to enable NFS end to end unit test

2014-08-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned HDFS-5135:
---

Assignee: Zhe Zhang

 Create a test framework to enable NFS end to end unit test
 --

 Key: HDFS-5135
 URL: https://issues.apache.org/jira/browse/HDFS-5135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Zhe Zhang

 Currently, we have to manually start portmap and nfs3 processes to test patch 
 and new functionalities. This JIRA is to track the effort to introduce a test 
 framework to NFS unit test without starting standalone nfs3 processes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test

2014-08-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098960#comment-14098960
 ] 

Zhe Zhang commented on HDFS-5135:
-

Hi Brandon, I'm a little confused by the description of this Jira. As I 
understand it, many test classes, including TestReaddir and TestWrites, are 
already unit tests not requiring manual startup of nfs3 and portmap. Is the 
purpose of this Jira to convert all remaining manual tests (such as 
TestUdpServer) to unit tests?

 Create a test framework to enable NFS end to end unit test
 --

 Key: HDFS-5135
 URL: https://issues.apache.org/jira/browse/HDFS-5135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Zhe Zhang

 Currently, we have to manually start portmap and nfs3 processes to test patch 
 and new functionalities. This JIRA is to track the effort to introduce a test 
 framework to NFS unit test without starting standalone nfs3 processes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--

Attachment: HDFS-6663.patch

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--

Status: Patch Available  (was: Open)

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099018#comment-14099018
 ] 

Chen He commented on HDFS-6663:
---

it can also shows how many expected, live, corrupted, decommission replicas for 
a given blockId, also, if a replica is corrupted, it will show the datanode 
with corruption reason. 

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099052#comment-14099052
 ] 

Hadoop QA commented on HDFS-6833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662084/HDFS-6833-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
org.apache.hadoop.hdfs.TestHDFSServerPorts

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7646//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7646//console

This message is automatically generated.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted 

[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099078#comment-14099078
 ] 

Hadoop QA commented on HDFS-6776:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662086/HDFS-6776.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHDFSServerPorts
org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7647//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7647//console

This message is automatically generated.

 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099243#comment-14099243
 ] 

Sanjay Radia commented on HDFS-6134:


We have made very good progress over the last few days. Thanks for taking the 
time for the offline technical discussions.  Below is a  summary of   the 
concerns I have raised previously in this Jira.
# Fix distcp and cp to *automatically* deal with EZ  using /r/r internally. 
Initially   we  need to support only  row 1 and row 4 in the table I attached  
in Hadoop-10919
# Fix Webhdfs to use KMS delegation tokens so that webhdfs can be used with 
transparent encryption  without giving user hdfs KMS proxy permission (and as 
a result to admins). Rest is a key  protocol for HDFS and for many Hadoop use 
cases, an Admin should not have access to the keys of  encrypted files.
# Further work on specifying what HAR should do (I have listed some use cases 
and proposed solutions ), and then follow it up with a fix to har.
# Some work on understanding availability and scalability on KMS for medium to 
large clusters. Perhaps we need to explore getting the keys ahead of time when 
a job is submitted.

Lets complete Items 1 and 2 promptly. Before we publish transparent encryption 
in a   2.x  release for pubic consumption, let us at  least complete item 1 (ie 
distcp and cp) and the flag to turn this feature on/of.



 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test

2014-08-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099246#comment-14099246
 ] 

Brandon Li commented on HDFS-5135:
--

Most of the current unit NFS tests are not end-to-end tests, which means they 
directly invoke the internal methods like what's in TestReaddir. In this way, 
some funcation/feature can't be covered. For example, we can't validate the 
response format. This has to be validated by mounting the export and doing 
manual test on Linux. We found quite a few response format problems in the 
past, in a painful way, because we don't have enough end to end tests.

TestOutOfOrderWrite is an end-to-end test but it doesn't provide a framework or 
class to be used by other tests.

Specifcally, we need a few things. We may want to create more JIRAs to split 
the work:
1. utilities to package every NFS/mountd requests
2. utilityes to parse every NFS/mountd response
3. a test UDP client and TCP client, which can deliver request to NFS and get 
response. 

Once we have these utilities, we can create tests easily. For example, one 
could easily write some tests like:
1. send create request 
2. assert response status is OK 
3. send same create request 
4. assert response status is not OK
... ...



 Create a test framework to enable NFS end to end unit test
 --

 Key: HDFS-5135
 URL: https://issues.apache.org/jira/browse/HDFS-5135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Zhe Zhang

 Currently, we have to manually start portmap and nfs3 processes to test patch 
 and new functionalities. This JIRA is to track the effort to introduce a test 
 framework to NFS unit test without starting standalone nfs3 processes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099311#comment-14099311
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro.  Wrt to the subtle difference between webhfs vs httpfs, can an admin 
grab the EDEKs and raw files and then log into the httpfs machine become user 
httpfs and then trick the KMS to  decrypt the keys because httpfs has proxy 
setting?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099318#comment-14099318
 ] 

Yongjun Zhang commented on HDFS-6825:
-

HI [~kihwal],

Thanks a lot for verifying and confirming, really appreciate it! 

Thanks [~andrew.wang] again for the comment about checking out recursive 
deletion, the process of addressing this comment led to this more complete 
solution than previous revisions.



 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
 HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6856) Send an OOB ack asynchronously

2014-08-15 Thread Brandon Li (JIRA)
Brandon Li created HDFS-6856:


 Summary: Send an OOB ack asynchronously
 Key: HDFS-6856
 URL: https://issues.apache.org/jira/browse/HDFS-6856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Brandon Li
Assignee: Brandon Li


As [~kihwal] pointed out in HDFS-6569,:
One bad client may block this and prevent the message from being sent to the 
rest of good clients. Unless a new thread is created (during shutdown!) to 
send an OOB ack asynchronously, the blocking ack.readFields() call needs to be 
changed in order to delegate the message transmission to the responder thread. 

This JIRA is to track the effort of sending OOB ack asynchronously.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade

2014-08-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099355#comment-14099355
 ] 

Brandon Li commented on HDFS-6569:
--

Thank you, Kihwal, for the review. I've created HDFS-6856 to track the effort 
of sending OOB ack asynchronously.
Uploaded a new patch to not send OOB twice.

 OOB message can't be sent to the client when DataNode shuts down for upgrade
 

 Key: HDFS-6569
 URL: https://issues.apache.org/jira/browse/HDFS-6569
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.4.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, 
 HDFS-6569.003.patch, test-hdfs-6569.patch


 The socket is closed too early before the OOB message can be sent to client, 
 which causes the write pipeline failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6569) OOB message can't be sent to the client when DataNode shuts down for upgrade

2014-08-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6569:
-

Attachment: HDFS-6569.003.patch

 OOB message can't be sent to the client when DataNode shuts down for upgrade
 

 Key: HDFS-6569
 URL: https://issues.apache.org/jira/browse/HDFS-6569
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.4.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-6569.001.patch, HDFS-6569.002.patch, 
 HDFS-6569.003.patch, test-hdfs-6569.patch


 The socket is closed too early before the OOB message can be sent to client, 
 which causes the write pipeline failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-15 Thread Selvamohan Neethiraj (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099357#comment-14099357
 ] 

Selvamohan Neethiraj commented on HDFS-6826:


Alejandro,

The use case for externalizing the authorization:  If a enterprise keeps their 
metadata details such as what is confidential in a separate system and 
provide access control based on the metadata, it is important to have a 
plug-able authorization module, which can use the metadata from external system 
and provide authorization to users/groups based on their own logic. I do not 
expect every organization to have a custom/plug-able authorization. But, this 
would allow security vendors and system integrators to expand security scope 
for hdfs.

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, 
 HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099415#comment-14099415
 ] 

Hadoop QA commented on HDFS-6663:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662134/HDFS-6663.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHDFSServerPorts
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7648//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7648//console

This message is automatically generated.

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-15 Thread Shinichi Yamashita (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099441#comment-14099441
 ] 

Shinichi Yamashita commented on HDFS-6833:
--

The following tests succeeded in my environment.

{quote}
org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
{quote}

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833-6.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--

Attachment: HDFS-6663-2.patch

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099453#comment-14099453
 ] 

Chen He commented on HDFS-6663:
---

update new patch that resolve the findbugs problem. The core-test failure is 
because of HDFS-6694

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-08-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--

Affects Version/s: 2.5.0

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-2.patch, HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099469#comment-14099469
 ] 

Alejandro Abdelnur commented on HDFS-6826:
--

[~sneethiraj], got it, it makes sense.

Next week I would start working on the v2 patch to get it in proper shape. As a 
first cut I would prefer to make things pluggable without altering APIs. we can 
work on refining the APIs once we have the desired functionality. Also, 
something to keep in mind, this plugin API is meant to be used by somebody with 
very good understanding of the NameNode guts and expected behavior.

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, 
 HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6774) Make FsDataset and DataStore support removing volumes.

2014-08-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099483#comment-14099483
 ] 

Aaron T. Myers commented on HDFS-6774:
--

Hey Eddy, patch looks pretty good to me. A few questions:

# The change in {{BlockPoolSlice}} - was that just a separate bug? Or why was 
that necessary?
# I see the code where we remove the replica info from the replica map, but do 
we not also need to do something similar in the event that the replica is 
currently referenced in the BlockScanner or DirectoryScanner data structures? 
It could be that we don't, but I wanted to check with you to see if you've 
considered this case.

 Make FsDataset and DataStore support removing volumes.
 --

 Key: HDFS-6774
 URL: https://issues.apache.org/jira/browse/HDFS-6774
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6774.000.patch, HDFS-6774.001.patch


 Managing volumes on DataNode includes decommissioning an active volume 
 without restarting DataNode. 
 This task adds support to remove volumes from {{DataStorage}} and 
 {{BlockPoolSliceStorage}} dynamically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)