[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095204#comment-14095204
 ] 

Sanjay Radia commented on HDFS-6134:


Larry I don't completely get the difference between webhdfs and httpfs but I 
think the cause of the difference is that user hdfs is superuser (note DN runs 
as hdfs and  webhdfs code is executed on behalf of the end-user inside the DN 
after checking the permissions), Hence I think  this would potentially open up 
access to all encrypted files that are readable. However that should NOT happen 
if doAs is used (correct?). 

I agree it would be unacceptable to say that if one enables transparent 
encryption then one should disable webhdfs because it would become insecure, 
Andrew  say that Regarding webhdfs, it's not a recommended deployment but 
Aljeandro  say Both httpfs and webhdfs will work just fine  but then in the 
same paragraph says this could fail some security audits.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.

2014-08-13 Thread Yi Liu (JIRA)
Yi Liu created HDFS-6846:


 Summary: NetworkTopology#sortByDistance should give nodes higher 
priority, which cache the block.
 Key: HDFS-6846
 URL: https://issues.apache.org/jira/browse/HDFS-6846
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Yi Liu


Currently there are 3 weights:
* local
* same rack
* off rack

But if some nodes cache the block, then it's faster if client read block from 
these nodes. So we should have some more weights as following:
* local
* cached  same rack
* same rack
* cached  off rack
* off rack




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6567) Clean up HdfsFileStatus

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095220#comment-14095220
 ] 

Hadoop QA commented on HDFS-6567:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661387/HDFS-6567.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7622//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7622//console

This message is automatically generated.

 Clean up HdfsFileStatus
 ---

 Key: HDFS-6567
 URL: https://issues.apache.org/jira/browse/HDFS-6567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6567.000.patch


 As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} 
 is reversed. This jira proposes to fix the order and to make the code more 
 consistent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6847) Archival Storage: Support storage policy on a directory

2014-08-13 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-6847:
---

 Summary: Archival Storage: Support storage policy on a directory
 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Summary: Archival Storage: Support storage policy on directories  (was: 
Archival Storage: Support storage policy on a directory)

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao

 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on a directory

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Description: 
This jira plans to add storage policy support on directory, i.e., users can 
set/get storage policy for not only files but also directories.

We allow users to set storage policies for nested directories/files. For a 
specific file/directory, its storage policy then should be its own storage 
policy, if it is specified, or the storage policy specified on its nearest 
ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
baz, bar, and foo should be p2, p2, and p1, respectively.

 Archival Storage: Support storage policy on a directory
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao

 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6844) Archival Storage: Extend HdfsFileStatus to get storage policy

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-6844.
-

Resolution: Duplicate

HDFS-6847 will cover the same functionality. Close this as duplicate.

 Archival Storage: Extend HdfsFileStatus to get storage policy
 -

 Key: HDFS-6844
 URL: https://issues.apache.org/jira/browse/HDFS-6844
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6844.000.patch


 We need a way to get the current storage policy id of existing files. This 
 can be achieved by extending HdfsFileStatus.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095245#comment-14095245
 ] 

Hadoop QA commented on HDFS-6321:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661392/HDFS-6321.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7623//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7623//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7623//console

This message is automatically generated.

 Add a message on the old web UI that indicates the old UI is deprecated
 ---

 Key: HDFS-6321
 URL: https://issues.apache.org/jira/browse/HDFS-6321
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6321.000.patch


 HDFS-6252 has removed the jsp ui from trunk. We should add a message in the 
 old web ui to indicate that the ui has been deprecated and ask the user to 
 move towards the new web ui. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Attachment: HDFS-6847.000.patch

Initial patch. Use XAttr to set storage policy id for directories.

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6843) Create FileStatus.isEncrypted() method

2014-08-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095299#comment-14095299
 ] 

Steve Loughran commented on HDFS-6843:
--

what about making it an enum in case encryption policies change in future?

 Create FileStatus.isEncrypted() method
 --

 Key: HDFS-6843
 URL: https://issues.apache.org/jira/browse/HDFS-6843
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb

 FileStatus should have a 'boolean isEncrypted()' method. (it was in the 
 context of discussing with AndreW about FileStatus being a Writable).
 Having this method would allow MR JobSubmitter do the following:
 -
 BOOLEAN intermediateEncryption = false
 IF jobconf.contains(mr.intermidate.encryption) THEN
   intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption)
 ELSE
   IF (I/O)Format INSTANCEOF File(I/O)Format THEN
 intermediateEncryption = ANY File(I/O)Format HAS a Path with status 
 isEncrypted()==TRUE
   FI
   jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption)
 FI



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-13 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated HDFS-6833:
-

Attachment: HDFS-6833.patch

I attach a patch file which I added a test case.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-08-13 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6841:


Attachment: HDFS-6841-001.patch

Changed in HDFS project.
 Didn't change in all Tests. Changed only for required tests.


 Use Time.monotonicNow() wherever applicable instead of Time.now()
 -

 Key: HDFS-6841
 URL: https://issues.apache.org/jira/browse/HDFS-6841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6841-001.patch


 {{Time.now()}} used  in many places to calculate elapsed time.
 This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
 System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-08-13 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6841:


Status: Patch Available  (was: Open)

 Use Time.monotonicNow() wherever applicable instead of Time.now()
 -

 Key: HDFS-6841
 URL: https://issues.apache.org/jira/browse/HDFS-6841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6841-001.patch


 {{Time.now()}} used  in many places to calculate elapsed time.
 This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
 System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095363#comment-14095363
 ] 

Uma Maheswara Rao G commented on HDFS-6247:
---

+1 Patch looks good to me. Thanks Vinay!

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095380#comment-14095380
 ] 

Hudson commented on HDFS-6830:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/644/])
HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block 
replica. (Arpit Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java


 BlockInfo.addStorage fails when DN changes the storage for a block replica
 --

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, 
 HDFS-6830.03.patch, HDFS-6830.04.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095385#comment-14095385
 ] 

Hudson commented on HDFS-6836:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/644/])
HDFS-6836. HDFS INFO logging is verbose  uses file appenders. (Contributed by 
Xiaoyu Yao) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java


 HDFS INFO logging is verbose  uses file appenders
 --

 Key: HDFS-6836
 URL: https://issues.apache.org/jira/browse/HDFS-6836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.4.1
Reporter: Gopal V
Assignee: Xiaoyu Yao
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch


 Reported by:  [~gopalv].
 The HDFS INFO logs is present within the inner loops of HDFS logging 
 information like
 {code}
 2014-07-24 19:43:34,459 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 6827335
 2014-07-24 19:43:34,465 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 7178626
 2014-07-24 19:43:34,467 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8540703
 2014-07-24 19:43:34,474 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 8220422
 2014-07-24 19:43:34,477 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8327499
 {code}
 Looks like future releases of log4j will fix this to be faster - 
 https://issues.apache.org/jira/browse/LOG4J2-163



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095433#comment-14095433
 ] 

Hadoop QA commented on HDFS-6833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661423/HDFS-6833.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7624//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7624//console

This message is automatically generated.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095470#comment-14095470
 ] 

Larry McCay commented on HDFS-6134:
---

I guess if webhdfs is allowed to doAs the end user 'hdfs' then that can be a 
problem.
But again, I don't see what keeps an admin from doing that with httpfs as well.

It seems as though KMS needs to have the ability to not allow 'hdfs' user gain 
keys through any trusted proxy but still allow a trusted proxy that is running 
as a superuser to doAs other users.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095499#comment-14095499
 ] 

Daryn Sharp commented on HDFS-6840:
---

I'd like complete removal of the random seed.  Why allow users the option of 
shooting themselves in the foot?  As far as I can tell, the seed was added 
due a misunderstanding that former behavior was deterministic?  I cannot 
envision a use case where all off-rack clients bombarding a single node is a 
good idea.

 Clients are always sent to the same datanode when read is off rack
 --

 Key: HDFS-6840
 URL: https://issues.apache.org/jira/browse/HDFS-6840
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Priority: Critical

 After HDFS-6268 the sorting order of block locations is deterministic for a 
 given block and locality level (e.g.: local, rack. off-rack), so off-rack 
 clients all see the same datanode for the same block.  This leads to very 
 poor behavior in distributed cache localization and other scenarios where 
 many clients all want the same block data at approximately the same time.  
 The one datanode is crushed by the load while the other replicas only handle 
 local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095513#comment-14095513
 ] 

Hudson commented on HDFS-6836:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
HDFS-6836. HDFS INFO logging is verbose  uses file appenders. (Contributed by 
Xiaoyu Yao) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java


 HDFS INFO logging is verbose  uses file appenders
 --

 Key: HDFS-6836
 URL: https://issues.apache.org/jira/browse/HDFS-6836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.4.1
Reporter: Gopal V
Assignee: Xiaoyu Yao
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch


 Reported by:  [~gopalv].
 The HDFS INFO logs is present within the inner loops of HDFS logging 
 information like
 {code}
 2014-07-24 19:43:34,459 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 6827335
 2014-07-24 19:43:34,465 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 7178626
 2014-07-24 19:43:34,467 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8540703
 2014-07-24 19:43:34,474 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 8220422
 2014-07-24 19:43:34,477 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8327499
 {code}
 Looks like future releases of log4j will fix this to be faster - 
 https://issues.apache.org/jira/browse/LOG4J2-163



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095508#comment-14095508
 ] 

Hudson commented on HDFS-6830:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block 
replica. (Arpit Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java


 BlockInfo.addStorage fails when DN changes the storage for a block replica
 --

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, 
 HDFS-6830.03.patch, HDFS-6830.04.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-13 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated HDFS-6833:
-

Attachment: HDFS-6833.patch

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095576#comment-14095576
 ] 

Hadoop QA commented on HDFS-6841:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661432/HDFS-6841-001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestGetBlocks
  org.apache.hadoop.hdfs.TestLease

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7625//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7625//console

This message is automatically generated.

 Use Time.monotonicNow() wherever applicable instead of Time.now()
 -

 Key: HDFS-6841
 URL: https://issues.apache.org/jira/browse/HDFS-6841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6841-001.patch


 {{Time.now()}} used  in many places to calculate elapsed time.
 This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
 System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6836) HDFS INFO logging is verbose uses file appenders

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095603#comment-14095603
 ] 

Hudson commented on HDFS-6836:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
HDFS-6836. HDFS INFO logging is verbose  uses file appenders. (Contributed by 
Xiaoyu Yao) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617603)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java


 HDFS INFO logging is verbose  uses file appenders
 --

 Key: HDFS-6836
 URL: https://issues.apache.org/jira/browse/HDFS-6836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.4.1
Reporter: Gopal V
Assignee: Xiaoyu Yao
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6836.0.patch, HDFS-6836.1.patch, HDFS-6836.2.patch


 Reported by:  [~gopalv].
 The HDFS INFO logs is present within the inner loops of HDFS logging 
 information like
 {code}
 2014-07-24 19:43:34,459 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43666, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86616576, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 6827335
 2014-07-24 19:43:34,465 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41731, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72691200, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 7178626
 2014-07-24 19:43:34,467 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43669, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86813696, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8540703
 2014-07-24 19:43:34,474 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.117:41733, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn117-10.l42scl.hortonworks.com,53868,1406227155459_1689003704_33,
  offset: 72822272, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074926372_1186916, duration: 
 8220422
 2014-07-24 19:43:34,477 INFO  DataNode.clienttrace 
 (BlockSender.java:sendBlock(738)) - src: /172.21.128.105:50010, dest: 
 /172.21.128.113:43672, bytes: 786432, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_cn113-10.l42scl.hortonworks.com,50700,1406227155474_1075922312_33,
  offset: 86944768, srvID: 3f80f56f-a6ea-4951-8db6-86b51938d144, blockid: 
 BP-971413386-172.21.128.105-1398117368124:blk_1074925960_1186504, duration: 
 8327499
 {code}
 Looks like future releases of log4j will fix this to be faster - 
 https://issues.apache.org/jira/browse/LOG4J2-163



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockInfo.addStorage fails when DN changes the storage for a block replica

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095598#comment-14095598
 ] 

Hudson commented on HDFS-6830:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
HDFS-6830. BlockInfo.addStorage fails when DN changes the storage for a block 
replica. (Arpit Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617598)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java


 BlockInfo.addStorage fails when DN changes the storage for a block replica
 --

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, 
 HDFS-6830.03.patch, HDFS-6830.04.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6016) Update datanode replacement policy to make writes more robust

2014-08-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6016:
-

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Per Nicholas' comment, I won't fix it.

 Update datanode replacement policy to make writes more robust
 -

 Key: HDFS-6016
 URL: https://issues.apache.org/jira/browse/HDFS-6016
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-6016.patch, HDFS-6016.patch


 As discussed in HDFS-5924, writers that are down to only one node due to node 
 failures can suffer if a DN does not restart in time. We do not worry about 
 writes that began with single replica. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6801) Archival Storage: Add a new data migration tool

2014-08-13 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6801:
--

Attachment: h6801_20140813.patch

h6801_20140813.patch: 1st patch.

 Archival Storage: Add a new data migration tool 
 

 Key: HDFS-6801
 URL: https://issues.apache.org/jira/browse/HDFS-6801
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6801_20140813.patch


 The tool is similar to Balancer.  It periodic scans the blocks in HDFS and 
 uses path and/or other meta data (e.g. mtime) to determine if a block should 
 be cooled down (i.e. hot = warm, or warm = cold) or warmed up (i.e. cold = 
 warm, or warm = hot).  In contrast to Balancer, the migration tool always 
 move replicas to a different storage type.  Similar to Balancer, the replicas 
 are moved in a way that the number of racks the block does not decrease.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id

2014-08-13 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--

Attachment: HDFS-6663-WIP.patch

In progress patch, will add more test cases that include decommission and block 
corruption in the next version.

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-WIP.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095668#comment-14095668
 ] 

Alejandro Abdelnur commented on HDFS-6134:
--

Let me try to explain things a different way.

When setting up filesystem encryption in HDFS (forget about webhdfs and httpfs 
for now), things will be configured so the HDFS superuser cannot retrieve 
decrypted 'file encryption keys'. Because the HDFS superuser has access to the 
encrypted versions of the files, having access to the decrypted 'file 
encryption keys' would allow the HDFS superuser to get access to the decrypted 
file. One of the goals of HDFS encryption is to prevent that.

This is achieved by blacklisting the HDFS superuser from retrieving decrypted 
'file encryption keys' from the KMS. This blacklist is must be enforced on the 
real UGI hitting the KMS (regardless if it is doing a doAs or not).

If you set up httpfs, it runs using the 'httpfs' user, a HDFS regular user 
configured as proxyuser to interact with HDFS and KMS doing doAs calls. 

If you set up webhdfs, it runs using the 'hdfs' user, the HDFS superuser, and 
this user will have to be configured as proxyuser in the KMS to work with doAs 
calls. Also the 'hdfs' user will have to be removed from the KMS decrypt-keys 
blacklist (*and this is the problem*).

Even if you audit the webhdfs code running in the DNs to ensure things are 
always done using doAs and that there is no foul play in the DN code there is 
an issue. The issue is:

* An HDFS admin logins to a DN in the cluster as 'hdfs'
* Then he kinits as 'hdsf/HOST'
* Then he curls the KMS asking to decrypted keys as user X doing  a doAs
* Because he has access to the encrypted file, and now has the decrypted key, 
gets access to the file in clear

hope this clarifies.





 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095672#comment-14095672
 ] 

Alejandro Abdelnur commented on HDFS-6826:
--

[~clamb], that seems correct. We make sure  this is not an issue under normal 
circumstances by implementing caching. The same would hold for any plugin 
implementation meant for production usage.

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095719#comment-14095719
 ] 

Larry McCay commented on HDFS-6134:
---

Thanks [~tucu00] that is pretty clear.

The question that remains for me is why this same scenario isn't achievable by 
the admin kinit'ing as httpfs/HOST or Oozie or some other trusted proxy and 
then issuing a request with a doAs user X.

We have to somehow fix this for webhdfs - it is an expected and valuable API 
and should remain so with encrypted files without introducing a vulnerability.

Even if we have to do something like use another proxy (like Knox) and a shared 
secret to ensure that there is additional verification of the origin of a KMS 
request from webhdfs. This would enable proxies to access webhdfs resources 
with a signed/encrypted token - if KMS gets a signed request from webhdfs that 
it can verify then it can proceed. The shared secret can be made available 
through the credential provider API and webhdfs itself would just see it as an 
opaque token that needs to be passed in the KMS request. Requiring an extra hop 
for this access would be unfortunate too but if it is for additional security 
of the data it may be acceptable.

Anyway, that's just a thought for keeping webhdfs as a first class citizen. We 
have to do something.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095730#comment-14095730
 ] 

Alejandro Abdelnur commented on HDFS-6134:
--

Larry, if the httpfs admin is a different person  than the hdfs admin you don't 
have the problem. 

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6845) XSS and or content injection in hdfs

2014-08-13 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095764#comment-14095764
 ] 

Haohui Mai commented on HDFS-6845:
--

The XSS is only for the only web UI which has been deprecated and removed in 
trunk. The new web UI is based on dust.js, which defends XSS attack much more 
systematically.

 XSS and or content injection in hdfs
 

 Key: HDFS-6845
 URL: https://issues.apache.org/jira/browse/HDFS-6845
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: clouds
  Labels: security

 Following up from email
 ...
 I was auditing the latest stable version of hdfs - 2.4.1 (as made
 available from
 http://mirror.nexcess.net/apache/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
 ), I noticed an interesting XSS filter.  Ok, sure.  But what intrigued me
 was where I didn't find any attempt to validate or sanitize.
 Within DatanodeJSPHelper.java - line 108, nnAddr is assigned the value
 from the raw parameter NAMENODE_ADDRESS. On line 120, printgotoform is
 called with the raw value.  Then then called JspHelper.java's
 printGotoForm method - Line 452.  Then on line 468, the unvalidated or
 sanitized value is printed to the html page.  Worst case, reflected XSS. 
 Better case, content injection.
 Similarily,  DatanodeJSPHelper.java's line 102 tokenString variable looks
 plausible but I am not certain if an incorrect token will cause the
 business logic to fail before the malicious input it displayed
 (JspHelper.java - line 465.)
 ...
 These are not the only XSS / Content injection points but should give an easy 
 idea to find the others.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095763#comment-14095763
 ] 

Daryn Sharp commented on HDFS-6840:
---

We believe but haven't proven that this deterministic behavior is causing even 
more problems.  Block replication and invalidation appear to be impacted.  As 
in changing the replication factor sometimes takes up to an hour to start, and 
there's a slow but steady increase in blocks pending deletion on clusters 
running 2.5.  We believe the NN is repeatedly picking the same faulty DN to 
issue the copy block and invalidate block.

 Clients are always sent to the same datanode when read is off rack
 --

 Key: HDFS-6840
 URL: https://issues.apache.org/jira/browse/HDFS-6840
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Priority: Critical

 After HDFS-6268 the sorting order of block locations is deterministic for a 
 given block and locality level (e.g.: local, rack. off-rack), so off-rack 
 clients all see the same datanode for the same block.  This leads to very 
 poor behavior in distributed cache localization and other scenarios where 
 many clients all want the same block data at approximately the same time.  
 The one datanode is crushed by the load while the other replicas only handle 
 local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated

2014-08-13 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095767#comment-14095767
 ] 

Haohui Mai commented on HDFS-6321:
--

Can you please minimize your patch? And it is sufficient to only display the 
message on index.jsp on NN.

 Add a message on the old web UI that indicates the old UI is deprecated
 ---

 Key: HDFS-6321
 URL: https://issues.apache.org/jira/browse/HDFS-6321
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6321.000.patch


 HDFS-6252 has removed the jsp ui from trunk. We should add a message in the 
 old web ui to indicate that the ui has been deprecated and ask the user to 
 move towards the new web ui. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5887) Add suffix to generated protobuf class

2014-08-13 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095769#comment-14095769
 ] 

Haohui Mai commented on HDFS-5887:
--

The jira is still valid. Please check fsimage.proto and the corresponding 
comments in HDFS-5698.

 Add suffix to generated protobuf class
 --

 Key: HDFS-5887
 URL: https://issues.apache.org/jira/browse/HDFS-5887
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Priority: Minor

 As suggested by [~tlipcon], the code is more readable if we give each class 
 generated by the protobuf the suffix Proto.
 This jira proposes to rename the classes and to introduce no functionality 
 changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5887) Add suffix to generated protobuf class

2014-08-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5887:
-

Assignee: (was: Haohui Mai)

 Add suffix to generated protobuf class
 --

 Key: HDFS-5887
 URL: https://issues.apache.org/jira/browse/HDFS-5887
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Priority: Minor

 As suggested by [~tlipcon], the code is more readable if we give each class 
 generated by the protobuf the suffix Proto.
 This jira proposes to rename the classes and to introduce no functionality 
 changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6567) Clean up HdfsFileStatus

2014-08-13 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095770#comment-14095770
 ] 

Haohui Mai commented on HDFS-6567:
--

Looks good to me. +1. I'll commit it shortly.

 Clean up HdfsFileStatus
 ---

 Key: HDFS-6567
 URL: https://issues.apache.org/jira/browse/HDFS-6567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6567.000.patch


 As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} 
 is reversed. This jira proposes to fix the order and to make the code more 
 consistent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated

2014-08-13 Thread Tassapol Athiapinya (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya updated HDFS-6321:
--

Attachment: HDFS-6321.001.patch

[~wheat9] Thanks for review. Can you please look at new patch 
(HDFS-6321.001.patch)? I only put a message for deprecated page at index.jsp in 
NN now.

 Add a message on the old web UI that indicates the old UI is deprecated
 ---

 Key: HDFS-6321
 URL: https://issues.apache.org/jira/browse/HDFS-6321
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6321.000.patch, HDFS-6321.001.patch


 HDFS-6252 has removed the jsp ui from trunk. We should add a message in the 
 old web ui to indicate that the ui has been deprecated and ask the user to 
 move towards the new web ui. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6567) Normalize the order of public final in HdfsFileStatus

2014-08-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6567:
-

Summary: Normalize the order of public final in HdfsFileStatus  (was: Clean 
up HdfsFileStatus)

 Normalize the order of public final in HdfsFileStatus
 -

 Key: HDFS-6567
 URL: https://issues.apache.org/jira/browse/HDFS-6567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6567.000.patch


 As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} 
 is reversed. This jira proposes to fix the order and to make the code more 
 consistent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6843) Create FileStatus.isEncrypted() method

2014-08-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095797#comment-14095797
 ] 

Andrew Wang commented on HDFS-6843:
---

I looked into how we'd do this, and one major issue is Writable compatibility. 
We can't add a new field to FileStatus without breaking compat. ACLs took the 
approach of re-using an unused bit in the permissions short, and we'd have to 
do something similar.

An enum would involve reserving more of our precious unused bits for this 
purpose. Steve, do you mind laying out your usecase in a little more detail? An 
enum by itself isn't very expressive. I figured if users want more information, 
we could add a new API that returns an EncryptedFileStatus with all the gory 
details.

 Create FileStatus.isEncrypted() method
 --

 Key: HDFS-6843
 URL: https://issues.apache.org/jira/browse/HDFS-6843
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb

 FileStatus should have a 'boolean isEncrypted()' method. (it was in the 
 context of discussing with AndreW about FileStatus being a Writable).
 Having this method would allow MR JobSubmitter do the following:
 -
 BOOLEAN intermediateEncryption = false
 IF jobconf.contains(mr.intermidate.encryption) THEN
   intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption)
 ELSE
   IF (I/O)Format INSTANCEOF File(I/O)Format THEN
 intermediateEncryption = ANY File(I/O)Format HAS a Path with status 
 isEncrypted()==TRUE
   FI
   jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption)
 FI



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.

2014-08-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095807#comment-14095807
 ] 

Jason Lowe commented on HDFS-6846:
--

This could be very undesirable if a single node is the only one that has a 
cached block and suddenly the block becomes very popular (e.g.: during 
localization across many nodes in a large cluster).  Unless the block is highly 
replicated, most requests will be off-rack and the one node that has it cached 
will be hammered.  Having the block in memory doesn't help if the NIC saturates 
from the traffic.  I just want to make sure we don't end up with another form 
of HDFS-6840.

 NetworkTopology#sortByDistance should give nodes higher priority, which cache 
 the block.
 

 Key: HDFS-6846
 URL: https://issues.apache.org/jira/browse/HDFS-6846
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Yi Liu

 Currently there are 3 weights:
 * local
 * same rack
 * off rack
 But if some nodes cache the block, then it's faster if client read block from 
 these nodes. So we should have some more weights as following:
 * local
 * cached  same rack
 * same rack
 * cached  off rack
 * off rack



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6567) Normalize the order of public final in HdfsFileStatus

2014-08-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6567:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~tassapola] for the 
contribution.

 Normalize the order of public final in HdfsFileStatus
 -

 Key: HDFS-6567
 URL: https://issues.apache.org/jira/browse/HDFS-6567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Fix For: 2.6.0

 Attachments: HDFS-6567.000.patch


 As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} 
 is reversed. This jira proposes to fix the order and to make the code more 
 consistent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095830#comment-14095830
 ] 

Vinayakumar B commented on HDFS-6247:
-

Thanks [~umamaheswararao] and [~clamb] for the reviews.
Committed to trunk and branch-2

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6247:


   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095840#comment-14095840
 ] 

Hudson commented on HDFS-6847:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6056/])
HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate 
responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617784)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java


 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6567) Normalize the order of public final in HdfsFileStatus

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095839#comment-14095839
 ] 

Hudson commented on HDFS-6567:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6056/])
HDFS-6567. Normalize the order of public final in HdfsFileStatus. Contributed 
by Tassapol Athiapinya. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617779)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java


 Normalize the order of public final in HdfsFileStatus
 -

 Key: HDFS-6567
 URL: https://issues.apache.org/jira/browse/HDFS-6567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Fix For: 2.6.0

 Attachments: HDFS-6567.000.patch


 As suggested in HDFS-6200, the order of public final in {{HdfsFileStatus}} 
 is reversed. This jira proposes to fix the order and to make the code more 
 consistent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095844#comment-14095844
 ] 

Hadoop QA commented on HDFS-6833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661465/HDFS-6833.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7626//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7626//console

This message is automatically generated.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, 
 HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095877#comment-14095877
 ] 

Jing Zhao commented on HDFS-6247:
-

Hi Vinay, looks like when you commit the patch you write the jira number as 
HDFS-6847 :) We may need to update the CHANGES.txt.

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095874#comment-14095874
 ] 

Colin Patrick McCabe commented on HDFS-6803:


bq. Should we require the implementations of FSInputStream should be 
thread-safe (or at least for PositionedReadable)? And modify some 
implementations such as WebHDFS to make pread concurrently?

I think stack's point 2.1 and 2.2 imply that pread can safely be called from 
multiple threads concurrently.  I guess we should document this too so that 
there's no confusion.

 Documenting DFSClient#DFSInputStream expectations reading and preading in 
 concurrent context
 

 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf


 Reviews of the patch posted the parent task suggest that we be more explicit 
 about how DFSIS is expected to behave when being read by contending threads. 
 It is also suggested that presumptions made internally be made explicit 
 documenting expectations.
 Before we put up a patch we've made a document of assertions we'd like to 
 make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
 a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095880#comment-14095880
 ] 

Daryn Sharp commented on HDFS-6826:
---

Arg, yesterday's jira issues apparently caused my comment to be lost.

The group mapping authz is a bit different.  It's not context sensitive, as in 
a user uniformly belongs to groups across the whole namesystem.  Path-based 
context sensitivity is adding hidden magic to a filesystem.  How will the 
special magic be represented to the user confused by why the perms/ACLs aren't 
being honored?  How will permission apis and FsShell interact with the magic?

Instead of trying to hack special behavior for a specific use case into the NN, 
how about leveraging what's there.   A cleaner way may be for a custom group 
mapping to fabricate groups something like hive:table or hive:table:column. 
  No code changes in the NN.  Everything is contained in the custom groups 
mapping.

I still think leveraging ACLs is the best way to go...

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095884#comment-14095884
 ] 

Vinayakumar B commented on HDFS-6247:
-

Oops. My Bad. Thanks Jing for pointing me out. I will correct it right away.

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test

2014-08-13 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095903#comment-14095903
 ] 

Zhe Zhang commented on HDFS-5135:
-

[~brandonli] I wonder why TestOutOfOrderWrite directly calls 
Nfs3Utils.writeChannel() instead of going through OpenFileCtx. Is it *Not* 
supposed to test the reordering capability of the NFS gateway?

 Create a test framework to enable NFS end to end unit test
 --

 Key: HDFS-5135
 URL: https://issues.apache.org/jira/browse/HDFS-5135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li

 Currently, we have to manually start portmap and nfs3 processes to test patch 
 and new functionalities. This JIRA is to track the effort to introduce a test 
 framework to NFS unit test without starting standalone nfs3 processes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095917#comment-14095917
 ] 

Vinayakumar B commented on HDFS-6247:
-

Jira number updated by  reverting and committing with correct Jira number. 
Thanks again Jing.

 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095930#comment-14095930
 ] 

Alejandro Abdelnur commented on HDFS-6826:
--

[~daryn],

bq. The group mapping authz is a bit different.  It's not context sensitive, as 
in a user uniformly belongs to groups across the whole namesystem.  

Mmmhh, I’d argue that it is context sensitive, 'user' context, just a different 
context.

bq. Path-based context sensitivity is adding hidden magic to a filesystem.  How 
will the special magic be represented to the user confused by why the 
perms/ACLs aren't being honored? 

The authorization enforcement semantics does not change at all. The plugin 
cannot change the permission check logic.

The plugin is responsible for providing user/group/permissions/ACLs information 
to the NN who enforces the permissions consistently regardless of the plugin in 
use.


bq. How will permission apis and FsShell interact with the magic?

The work as usual. Check the attached patch, the current HDFS 
user/group/permission/ACLs handling is done by a plugin implementation.

Said that, a plugin implementation may decide to disable changes of 
user/group/permissions/ACLs. This can be done either silently or failing.


bq. Instead of trying to hack special behavior for a specific use case into the 
NN, how about leveraging what's there.

The proposal doc describes in detail 3 different usecases: HiveMetaStore 
tables, Hbase tables, Solr search collections.

bq. A cleaner way may be for a custom group mapping to fabricate groups 
something like hive:table or hive:table:column.   No code changes in the 
NN.  Everything is contained in the custom groups mapping.

This does not solve the problem. When adding a directory as a HiveMetaStore 
table partition, unless you set those special groups explicitly, they would not 
be in the  files being added to the table.

It requires client side group manipulation and this is what breaks things.

bq. I still think leveraging ACLs is the best way to go...

Actually, we are. In the case of HiveMetaStore, the plugin would expose GRANT 
permissions as ACLs.

Daryn, I'm happy to jump on the phone if you want have a synchronous discussion.


 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-08-13 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6841:


Attachment: HDFS-6841-002.patch

Fixed Tests

 Use Time.monotonicNow() wherever applicable instead of Time.now()
 -

 Key: HDFS-6841
 URL: https://issues.apache.org/jira/browse/HDFS-6841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch


 {{Time.now()}} used  in many places to calculate elapsed time.
 This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
 System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()

2014-08-13 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6848:


 Summary: Lack of synchronization on access to datanodeUuid in 
DataStorage#format() 
 Key: HDFS-6848
 URL: https://issues.apache.org/jira/browse/HDFS-6848
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
this.datanodeUuid = datanodeUuid;
{code}
The above assignment should be done holding lock DataStorage.this - as is 
done in two other places.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6832) Fix the usage of 'hdfs namenode' command

2014-08-13 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095938#comment-14095938
 ] 

Akira AJISAKA commented on HDFS-6832:
-

[~Pooja.Gupta], feel free to create a patch and attach it to this jira.
Unfortunately, I don't have the permission to assign you. A committer will 
assign you when the patch is committed.

 Fix the usage of 'hdfs namenode' command
 

 Key: HDFS-6832
 URL: https://issues.apache.org/jira/browse/HDFS-6832
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie

 {code}
 [root@trunk ~]# hdfs namenode -help
 Usage: java NameNode [-backup] | 
   [-checkpoint] | 
   [-format [-clusterid cid ] [-force] [-nonInteractive] ] | 
   [-upgrade [-clusterid cid] [-renameReservedk-v pairs] ] | 
   [-upgradeOnly [-clusterid cid] [-renameReservedk-v pairs] ] | 
   [-rollback] | 
   [-rollingUpgrade downgrade|rollback ] | 
   [-finalize] | 
   [-importCheckpoint] | 
   [-initializeSharedEdits] | 
   [-bootstrapStandby] | 
   [-recover [ -force] ] | 
   [-metadataVersion ]  ]
 {code}
 There're some issues in the usage to be fixed.
 # Usage: java NameNode should be Usage: hdfs namenode
 # -rollingUpgrade started option should be added
 # The last ']' should be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6247) Avoid timeouts for replaceBlock() call by sending intermediate responses to Balancer

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095949#comment-14095949
 ] 

Hudson commented on HDFS-6247:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6057/])
HDFS-6247. Avoid timeouts for replaceBlock() call by sending intermediate 
responses to Balancer (vinayakumarb) (vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617799)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java


 Avoid timeouts for replaceBlock() call by sending intermediate responses to 
 Balancer
 

 Key: HDFS-6247
 URL: https://issues.apache.org/jira/browse/HDFS-6247
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, HDFS-6247.patch, 
 HDFS-6247.patch


 Currently there is no response sent from target Datanode to Balancer for the 
 replaceBlock() calls.
 Since the Block movement for balancing is throttled, complete block movement 
 will take time and this could result in timeout at Balancer, which will be 
 trying to read the status message.
  
 To Avoid this during replaceBlock() call in in progress Datanode  can send 
 IN_PROGRESS status messages to Balancer to avoid timeouts and treat 
 BlockMovement as  failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095948#comment-14095948
 ] 

Hudson commented on HDFS-6847:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6057/])
Reverted
Merged revision(s) 1617784 from hadoop/common/trunk:
HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate 
responses to Balancer (Contributed by Vinayakumar B.)
 (vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617794)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java


 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6546) Add non-superuser capability to get the encryption zone for a specific path

2014-08-13 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6546:
---

Attachment: HDFS-6546.001.patch

The attached patch adds a new method to HdfsAdmin: getEncryptionZoneRootForPath 
which accepts a path and returns the path of the EZ root (if the arg is in an 
ez) or null (if it is not in an ez.


 Add non-superuser capability to get the encryption zone for a specific path
 ---

 Key: HDFS-6546
 URL: https://issues.apache.org/jira/browse/HDFS-6546
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6546.001.patch


 Need to add protocol, api, and CLI that allows a non super user to ask 
 whether a path is part of an EZ, and if so, which one.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro, if we treat user hdfs as a special user such that the  HDFS system 
will not accept any client connections from  hdfs then does this solve this 
problem?. An Admin will not be able to connect as user hdfs but can connect 
as user ClarkKent where  ClarkKent is in the superuser group of hdfs so 
that the admin can do his job as superuser.  It does means that we are trusting 
the HDFS code to be correct in not abusing its access to keys since it has 
proxy authority with KMS (this was not required so far.)

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027
 ] 

Sanjay Radia edited comment on HDFS-6134 at 8/13/14 8:19 PM:
-

Alejandro, a potential solution:  treat user hdfs as a special user such that 
the  HDFS system will NOT accept any client connections from  hdfs. An Admin 
will not be able to connect as user hdfs but can connect as user, say,  
ClarkKent where  ClarkKent is in the superuser group of hdfs so that the 
admin can do his job as superuser.  It does means that we are trusting the HDFS 
code to be correct in not abusing its access to keys since it has proxy 
authority with KMS (this was not required so far.)


was (Author: sanjay.radia):
Alejandro, if we treat user hdfs as a special user such that the  HDFS system 
will not accept any client connections from  hdfs then does this solve this 
problem?. An Admin will not be able to connect as user hdfs but can connect 
as user ClarkKent where  ClarkKent is in the superuser group of hdfs so 
that the admin can do his job as superuser.  It does means that we are trusting 
the HDFS code to be correct in not abusing its access to keys since it has 
proxy authority with KMS (this was not required so far.)

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation

2014-08-13 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur moved HADOOP-10836 to HDFS-6849:
---

  Component/s: (was: security)
   security
 Target Version/s:   (was: 2.6.0)
Affects Version/s: (was: 2.4.1)
   2.4.1
  Key: HDFS-6849  (was: HADOOP-10836)
  Project: Hadoop HDFS  (was: Hadoop Common)

 Replace HttpFS custom proxyuser handling with common implementation
 ---

 Key: HDFS-6849
 URL: https://issues.apache.org/jira/browse/HDFS-6849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, 
 HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch


 Use HADOOP-10835 to implement proxyuser logic in HttpFS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation

2014-08-13 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-6849:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

committed to trunk and branch-2.

 Replace HttpFS custom proxyuser handling with common implementation
 ---

 Key: HDFS-6849
 URL: https://issues.apache.org/jira/browse/HDFS-6849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.6.0

 Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, 
 HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch


 Use HADOOP-10835 to implement proxyuser logic in HttpFS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6849) Replace HttpFS custom proxyuser handling with common implementation

2014-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096116#comment-14096116
 ] 

Hudson commented on HDFS-6849:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6060/])
HDFS-6849. Replace HttpFS custom proxyuser handling with common implementation. 
(tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617831)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServerWebApp.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/ProxyUser.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/security/ProxyUserService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs/UserProvider.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/security/TestProxyUserService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/wsrs/TestUserProvider.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Replace HttpFS custom proxyuser handling with common implementation
 ---

 Key: HDFS-6849
 URL: https://issues.apache.org/jira/browse/HDFS-6849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.6.0

 Attachments: COMBO.patch, HADOOP-10836.patch, HADOOP-10836.patch, 
 HADOOP-10836.patch, HADOOP-10836.patch, HADOOP-10836.patch


 Use HADOOP-10835 to implement proxyuser logic in HttpFS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6321) Add a message on the old web UI that indicates the old UI is deprecated

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096124#comment-14096124
 ] 

Hadoop QA commented on HDFS-6321:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661489/HDFS-6321.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7627//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7627//console

This message is automatically generated.

 Add a message on the old web UI that indicates the old UI is deprecated
 ---

 Key: HDFS-6321
 URL: https://issues.apache.org/jira/browse/HDFS-6321
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Tassapol Athiapinya
 Attachments: HDFS-6321.000.patch, HDFS-6321.001.patch


 HDFS-6252 has removed the jsp ui from trunk. We should add a message in the 
 old web ui to indicate that the ui has been deprecated and ask the user to 
 move towards the new web ui. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Comment: was deleted

(was: FAILURE: Integrated in Hadoop-trunk-Commit #6056 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6056/])
HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate 
responses to Balancer (Contributed by Vinayakumar B.) (vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617784)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java
)

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Comment: was deleted

(was: FAILURE: Integrated in Hadoop-trunk-Commit #6057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6057/])
Reverted
Merged revision(s) 1617784 from hadoop/common/trunk:
HDFS-6847. Avoid timeouts for replaceBlock() call by sending intermediate 
responses to Balancer (Contributed by Vinayakumar B.)
 (vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617794)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java
)

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096170#comment-14096170
 ] 

Jing Zhao commented on HDFS-6847:
-

We still need to handle snapshot correctly. Also it may be better to integrate 
the set/getStoragePolicy methods into INode.java. Will update the patch later.

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096177#comment-14096177
 ] 

Larry McCay commented on HDFS-6134:
---

And that is ensured by file permissions on the keytab?


On Wed, Aug 13, 2014 at 1:14 PM, Alejandro Abdelnur (JIRA) j...@apache.org



 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6841) Use Time.monotonicNow() wherever applicable instead of Time.now()

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096203#comment-14096203
 ] 

Hadoop QA commented on HDFS-6841:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661506/HDFS-6841-002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7628//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7628//console

This message is automatically generated.

 Use Time.monotonicNow() wherever applicable instead of Time.now()
 -

 Key: HDFS-6841
 URL: https://issues.apache.org/jira/browse/HDFS-6841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6841-001.patch, HDFS-6841-002.patch


 {{Time.now()}} used  in many places to calculate elapsed time.
 This should be replaced with {{Time.monotonicNow()}} to avoid effect of 
 System time changes on elapsed time calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096214#comment-14096214
 ] 

Alejandro Abdelnur commented on HDFS-6134:
--

if httpfs and NN or DNs run in the same box, yes. however, in a prod 
environment that would not commonly be the case.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Status: Open  (was: Patch Available)

 Unit testing for out of order writes
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor

 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-6850:
---

 Summary: Unit testing for out of order writes
 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor


Expanding TestWrites class to include the out of order writing scenario. I 
think it is logical to merge the OOO scenario in the TestWrites class instead 
of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Status: Patch Available  (was: Open)

 Unit testing for out of order writes
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor

 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Attachment: HDFS-6850.patch

 Unit testing for out of order writes
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Status: Patch Available  (was: Open)

 Unit testing for out of order writes
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()

2014-08-13 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096254#comment-14096254
 ] 

Xiaoyu Yao commented on HDFS-6848:
--

Note: the only caller of DataStorage#format() is the synchronized method 
addStorageLocations. So we should be fine without any changes. 

If this going to be called by other non-synchronized method in future, it is 
better to call the synchronized method setDatanodeUuid() instead of the direct 
assignment as Ted reported above in DataStorage#format().

{code}
private synchronized void addStorageLocations(DataNode datanode,...)
{

 format(sd, nsInfo, datanode.getDatanodeUuid());

}  
{code}

 Lack of synchronization on access to datanodeUuid in DataStorage#format() 
 --

 Key: HDFS-6848
 URL: https://issues.apache.org/jira/browse/HDFS-6848
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 {code}
 this.datanodeUuid = datanodeUuid;
 {code}
 The above assignment should be done holding lock DataStorage.this - as is 
 done in two other places.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6847) Archival Storage: Support storage policy on directories

2014-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6847:


Attachment: HDFS-6847.001.patch

Update the patch. Main changes:
# Update the patch to support snapshot path
# Move getStoragePolicyID into INode.java.
# Fix bugs and add unit tests

 Archival Storage: Support storage policy on directories
 ---

 Key: HDFS-6847
 URL: https://issues.apache.org/jira/browse/HDFS-6847
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6847.000.patch, HDFS-6847.001.patch


 This jira plans to add storage policy support on directory, i.e., users can 
 set/get storage policy for not only files but also directories.
 We allow users to set storage policies for nested directories/files. For a 
 specific file/directory, its storage policy then should be its own storage 
 policy, if it is specified, or the storage policy specified on its nearest 
 ancestral directory. E.g., for a path /foo/bar/baz, if two different policies 
 are set on foo and bar (p1 for foo and p2 for bar), the storage policies for 
 baz, bar, and foo should be p2, p2, and p1, respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side

2014-08-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096269#comment-14096269
 ] 

Todd Lipcon commented on HDFS-6561:
---

Hey James. I looked over this patch and wrote/ran some performance tests.

At first, I was a little concerned that the way you consolidated the code for 
bulk_crc32 would slow things down. In particular, there's now a new branch per 
chunk to determine whether to store or verify the CRC. I was worried that, if 
this branch were mispredicted, we'd pay an extra 15-20 cycles for every 
512-byte chunk (which at 0.13 cycles/byte only takes ~66 cycles). That would 
represent a close to 20% performance regression.

So, I wrote a simple test which approximates exactly the HDFS usage of these 
APIs -- ie 512 byte chunks and a reasonable amount of data. In this test, I 
found that the above concern was unwarranted - probably because the branch 
prediction unit does a very good job with the simple branch pattern here. I'll 
attach a version of your patch which includes the benchmark that I wrote in 
case anyone else wants to run it.

Here are my average timings for 512MB of 512-byte-chunked checksums (on my 
Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz)

*Before*:
Calculate: 401275us (1.28GB/sec)
Verify: 41184us (12.43GB/sec

*After*:
Calculate: 41808us (12.25GB/sec)
Verify: 41604us (12.31GB/sec)

These seem to match earlier results you've posted elsewhere - just wanted to 
confirm on my machine and make sure that the existing verify code path didn't 
regress due to the new functionality.

For ease of review, I also think it makes sense to split this patch up a little 
further, and make this JIRA only do the changes to the checksumming code to 
allow for native calculation. The changes to FSOutputSummer, DFSOutputStream, 
etc, are a bit more complex and probably should be reviewed separately. I took 
the liberty of removing those chunks from the patch as I was testing it, so 
I'll upload that here and you can take a look.

Given the above, I only reviewed the portion related to checksumming and didn't 
yet look in detail at the outputsummer, etc, changes.

 Byte array native checksumming on client side
 -

 Key: HDFS-6561
 URL: https://issues.apache.org/jira/browse/HDFS-6561
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6850) Unit testing for out of order writes

2014-08-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-6850:


Assignee: Zhe Zhang

 Unit testing for out of order writes
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6850:
-

 Target Version/s: 2.6.0
Affects Version/s: (was: 3.0.0)
   2.6.0
  Summary: Move NFS out of order write unit tests into TestWrites 
class  (was: Unit testing for out of order writes)

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-4486) Add log category for long-running DFSClient notices

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned HDFS-4486:
---

Assignee: Zhe Zhang

 Add log category for long-running DFSClient notices
 ---

 Key: HDFS-4486
 URL: https://issues.apache.org/jira/browse/HDFS-4486
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Zhe Zhang
Priority: Minor

 There are a number of features in the DFS client which are transparent but 
 can make a fairly big difference for performance -- two in particular are 
 short circuit reads and native checksumming. Because we don't want log spew 
 for clients like hadoop fs -cat we currently log only at DEBUG level when 
 these features are disabled. This makes it difficult to troubleshoot/verify 
 for long-running perf-sensitive clients like HBase.
 One simple solution is to add a new log category - eg 
 o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could 
 enable at DEBUG level without getting the full debug spew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096309#comment-14096309
 ] 

Hadoop QA commented on HDFS-6850:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661565/HDFS-6850.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7629//console

This message is automatically generated.

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Attachment: HDFS-6850.patch

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Attachment: (was: HDFS-6850.patch)

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6801) Archival Storage: Add a new data migration tool

2014-08-13 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6801:
--

Attachment: h6801_20140814.patch

h6801_20140814.patch: fixes some bugs.  Still need to add code to start the 
dispatcher.

 Archival Storage: Add a new data migration tool 
 

 Key: HDFS-6801
 URL: https://issues.apache.org/jira/browse/HDFS-6801
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6801_20140813.patch, h6801_20140814.patch


 The tool is similar to Balancer.  It periodic scans the blocks in HDFS and 
 uses path and/or other meta data (e.g. mtime) to determine if a block should 
 be cooled down (i.e. hot = warm, or warm = cold) or warmed up (i.e. cold = 
 warm, or warm = hot).  In contrast to Balancer, the migration tool always 
 move replicas to a different storage type.  Similar to Balancer, the replicas 
 are moved in a way that the number of racks the block does not decrease.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


Status: Open  (was: Patch Available)

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096317#comment-14096317
 ] 

Zhe Zhang commented on HDFS-6850:
-

The first patch file wasn't correctly generated. Resubmitting now.

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-6850:


 Target Version/s: 3.0.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   3.0.0
   Status: Patch Available  (was: Open)

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096346#comment-14096346
 ] 

Daryn Sharp commented on HDFS-6826:
---

Haha, synchronous discussion - that made my day. Yes, I'll contact you offline.

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-6826-idea.patch, 
 HDFSPluggableAuthorizationProposal.pdf


 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6546) Add non-superuser capability to get the encryption zone for a specific path

2014-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096350#comment-14096350
 ] 

Colin Patrick McCabe commented on HDFS-6546:


Nice idea.  

Returning just a path seems a bit inflexible.  Can we also return an encryption 
zone id of sorts?  I think the inode ID of the EZ would work pretty nicely 
(based on some offline discussion with Andrew).  That way we can also add more 
stuff if we want later... we're not locked into just what fields Path has.

Also, I noticed a few places in the test where you inverted expected and 
provided.  The expected thing should come first in Assert.assert, so if the 
test fails, you don't get confusing error messages...

One last thing... I modified the test slightly to call this API on something in 
a snapshot, and it failed with this exception:
{code}
Running org.apache.hadoop.hdfs.TestEncryptionZones
Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.539 sec  
FAILURE! - in org.apache.hadoop.hdfs.TestEncryptionZones
testGetEZRootAsNonSuperUser(org.apache.hadoop.hdfs.TestEncryptionZones)  Time 
elapsed: 3.876 sec   ERROR!
org.apache.hadoop.ipc.RemoteException: Modification on a read-only snapshot is 
disallowed
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath4Write(FSDirectory.java:3071)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath4Write(FSDirectory.java:1490)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEZRootForPath(FSNamesystem.java:8598)
{code}

This should work on snapshotted files... probably a good idea to add a unit 
test for that.  Similarly, we should test what happens when both the file and 
the EZ have been deleted, but are still in a snapshot.  Thanks

 Add non-superuser capability to get the encryption zone for a specific path
 ---

 Key: HDFS-6546
 URL: https://issues.apache.org/jira/browse/HDFS-6546
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6546.001.patch


 Need to add protocol, api, and CLI that allows a non super user to ask 
 whether a path is part of an EZ, and if so, which one.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side

2014-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096361#comment-14096361
 ] 

Colin Patrick McCabe commented on HDFS-6561:


Great idea.

The {{hdfs-6561-just-hadoop-changes.txt}} patch needs to be rebased... it 
didn't apply cleanly for me against trunk.

{code}
JNIEXPORT void JNICALL 
Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSums
  (JNIEnv *env, jclass clazz,
jint bytes_per_checksum, jint j_crc_type,
jobject j_sums, jint sums_offset,
jobject j_data, jint data_offset, jint data_len,
jstring j_filename, jlong base_pos, jboolean verify)
{code}

Later, you use an if(likely) on the verify boolean.  Rather than do this, why 
not just have a utility function that both nativeComputeChunkedSumsByteArray 
and nativeVerifyChunkedSums call?

{code}
-#include stdint.h
+#include stdbool.h
{code}

Please, no.  There are a lot of older C compilers floating around out there 
that will choke on this.  Plus we still need {{stdint.h}}, since we're using 
{{uint32_t}}, etc. etc.  I don't think the C99 _Bool stuff adds a lot of type 
safety anyway, since any non-struct type can implicitly be converted to a bool, 
and a bool can be used as in int in many contexts.

 Byte array native checksumming on client side
 -

 Key: HDFS-6561
 URL: https://issues.apache.org/jira/browse/HDFS-6561
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch, 
 hdfs-6561-just-hadoop-changes.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6850) Move NFS out of order write unit tests into TestWrites class

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096379#comment-14096379
 ] 

Hadoop QA commented on HDFS-6850:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661589/HDFS-6850.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7630//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7630//console

This message is automatically generated.

 Move NFS out of order write unit tests into TestWrites class
 

 Key: HDFS-6850
 URL: https://issues.apache.org/jira/browse/HDFS-6850
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-6850.patch


 Expanding TestWrites class to include the out of order writing scenario. I 
 think it is logical to merge the OOO scenario in the TestWrites class instead 
 of having a separate TestOutOfOrderWrite class. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6561) Byte array native checksumming on client side

2014-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096378#comment-14096378
 ] 

Hadoop QA commented on HDFS-6561:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661574/hdfs-6561-just-hadoop-changes.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7631//console

This message is automatically generated.

 Byte array native checksumming on client side
 -

 Key: HDFS-6561
 URL: https://issues.apache.org/jira/browse/HDFS-6561
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6561.2.patch, HDFS-6561.3.patch, HDFS-6561.patch, 
 hdfs-6561-just-hadoop-changes.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic

2014-08-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096405#comment-14096405
 ] 

Andrew Wang commented on HDFS-6783:
---

+1 from me too, thanks for working on this Yi and Colin.

 Fix HDFS CacheReplicationMonitor rescan logic
 -

 Key: HDFS-6783
 URL: https://issues.apache.org/jira/browse/HDFS-6783
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 3.0.0
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, 
 HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, 
 HDFS-6783.006.patch


 In monitor thread, needsRescan is set to false before real scan starts, so 
 for {{waitForRescanIfNeeded}} will return for the first condition:
 {code}
 if (!needsRescan) {
   return;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6851) Flush EncryptionZoneWithId and add an id field to EncryptionZone

2014-08-13 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-6851:
--

 Summary: Flush EncryptionZoneWithId and add an id field to 
EncryptionZone
 Key: HDFS-6851
 URL: https://issues.apache.org/jira/browse/HDFS-6851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb


EncryptionZoneWithId can be flushed by moving the id field up to EncryptionZone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6634) inotify in HDFS

2014-08-13 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6634:
---

Attachment: HDFS-6634.4.patch

Thanks for the comments, Andrew. Updated patch.

 inotify in HDFS
 ---

 Key: HDFS-6634
 URL: https://issues.apache.org/jira/browse/HDFS-6634
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode, qjm
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6634.2.patch, HDFS-6634.3.patch, HDFS-6634.4.patch, 
 HDFS-6634.patch, inotify-design.2.pdf, inotify-design.pdf, 
 inotify-intro.2.pdf, inotify-intro.pdf


 Design a mechanism for applications like search engines to access the HDFS 
 edit stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6848) Lack of synchronization on access to datanodeUuid in DataStorage#format()

2014-08-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096431#comment-14096431
 ] 

Ted Yu commented on HDFS-6848:
--

bq. it is better to call the synchronized method setDatanodeUuid() 
That should be good.

 Lack of synchronization on access to datanodeUuid in DataStorage#format() 
 --

 Key: HDFS-6848
 URL: https://issues.apache.org/jira/browse/HDFS-6848
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 {code}
 this.datanodeUuid = datanodeUuid;
 {code}
 The above assignment should be done holding lock DataStorage.this - as is 
 done in two other places.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >