date:20140707


[ 
https://issues.apache.org/jira/browse/HADOOP-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053556#comment-14053556
 ] 

Hudson commented on HADOOP-10769:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #606 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/606/])
HADOOP-10769. Create KeyProvider extension to handle delegation tokens. 
Contributed by Arun Suresh. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608286)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderDelegationTokenExtension.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyProviderDelegationTokenExtension.java


 Create KeyProvider extension to handle delegation tokens
 

 Key: HADOOP-10769
 URL: https://issues.apache.org/jira/browse/HADOOP-10769
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
 Fix For: 3.0.0

 Attachments: HADOOP-10769.1.patch, HADOOP-10769.2.patch, 
 HADOOP-10769.3.patch


 The KeyProvider API needs to return delegation tokens to enable access to the 
 KeyProvider from processes without Kerberos credentials (ie Yarn containers).
 This is required for HDFS encryption and KMS integration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10769) Create KeyProvider extension to handle delegation tokens


[ 
https://issues.apache.org/jira/browse/HADOOP-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053675#comment-14053675
 ] 

Hudson commented on HADOOP-10769:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1797 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1797/])
HADOOP-10769. Create KeyProvider extension to handle delegation tokens. 
Contributed by Arun Suresh. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608286)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderDelegationTokenExtension.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyProviderDelegationTokenExtension.java


 Create KeyProvider extension to handle delegation tokens
 

 Key: HADOOP-10769
 URL: https://issues.apache.org/jira/browse/HADOOP-10769
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
 Fix For: 3.0.0

 Attachments: HADOOP-10769.1.patch, HADOOP-10769.2.patch, 
 HADOOP-10769.3.patch


 The KeyProvider API needs to return delegation tokens to enable access to the 
 KeyProvider from processes without Kerberos credentials (ie Yarn containers).
 This is required for HDFS encryption and KMS integration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9902) Shell script rewrite

2014-07-07 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-9902:
-

Status: Patch Available  (was: Open)

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: releasenotes
 Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, 
 HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10769) Create KeyProvider extension to handle delegation tokens


[ 
https://issues.apache.org/jira/browse/HADOOP-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053748#comment-14053748
 ] 

Hudson commented on HADOOP-10769:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1824 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1824/])
HADOOP-10769. Create KeyProvider extension to handle delegation tokens. 
Contributed by Arun Suresh. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608286)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderDelegationTokenExtension.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyProviderDelegationTokenExtension.java


 Create KeyProvider extension to handle delegation tokens
 

 Key: HADOOP-10769
 URL: https://issues.apache.org/jira/browse/HADOOP-10769
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
 Fix For: 3.0.0

 Attachments: HADOOP-10769.1.patch, HADOOP-10769.2.patch, 
 HADOOP-10769.3.patch


 The KeyProvider API needs to return delegation tokens to enable access to the 
 KeyProvider from processes without Kerberos credentials (ie Yarn containers).
 This is required for HDFS encryption and KMS integration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10784) Need add more in KMS document

2014-07-07 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053827#comment-14053827
 ] 

Larry McCay commented on HADOOP-10784:
--

Hi [~kellyzly] - Thanks for the feedback. You will want to try the --negotiate 
flag with curl - it isn't up to you to provide the token. The document could 
use a simple example. Feel free to contribute a patch back once you get 
--negotiate working to your liking.

 Need add more in KMS document
 -

 Key: HADOOP-10784
 URL: https://issues.apache.org/jira/browse/HADOOP-10784
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: liyunzhang
Priority: Minor

 Now i can only find the kms document in 
 http://aajisaka.github.io/hadoop-project/hadoop-kms/index.html, but it is 
 very simple. for example, i don't know how to enabling Kerberos HTTP SPNEGO 
 Authentication although i configure the kms-site.xml according to the 
 reference page.
 How to test it ?
 I send following request to KMS server:
  curl -g --header  Authorization:Negotiate123455 
 http://localhost:16000/kms/v1/key/k1
 I read the KMS code and found that  i need add parameters in request header 
 and  the format is Authorization:Negotiate $token. But how the token is 
 generated?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10786) Patch that fixes UGI#reloginFromKeytab on java 8

2014-07-07 Thread Tobi Vollebregt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053856#comment-14053856
 ] 

Tobi Vollebregt commented on HADOOP-10786:
--

Not sure why build is failing, there are no compile errors in the log. Can't 
reproduce build failure locally - build is failing (flaky?) even on trunk.

 Patch that fixes UGI#reloginFromKeytab on java 8
 

 Key: HADOOP-10786
 URL: https://issues.apache.org/jira/browse/HADOOP-10786
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Tobi Vollebregt
Priority: Minor
 Attachments: HADOOP-10786.patch


 Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and 
 storeKey are specified, then only a KeyTab object is added to the Subject's 
 private credentials, whereas in java = 7 both a KeyTab and some number of 
 KerberosKey objects was added.
 The UGI constructor checks whether or not a keytab was used to login by 
 looking if there are any KerberosKey objects in the Subject's private 
 credentials. If there are, the isKeyTab is set to true, and otherwise it's 
 false.
 Thus, in java 8 isKeyTab is always false given the current UGI 
 implementation, which makes UGI#reloginFromKeytab fail silently.
 Attached patch will check for a KeyTab object on the Subject, instead of a 
 KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java 
 8, and works on Oracle java 7 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster

2014-07-07 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053897#comment-14053897
 ] 

Todd Lipcon commented on HADOOP-10778:
--

It also would depend a lot on the version of zlib that you've got on your 
system. The CRC32 implementation in java.util.zip is just a wrapper around 
zlib's crc32 function, even in the latest Java: 
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/native/java/util/zip/CRC32.c

So, we should probably run this benchmark on a system which is representative 
of customer servers, not just an OSX laptop. Have you had a chance to try it on 
eg a Sandy Bridge server running RHEL 6?

The other factor which isn't captured by the benchmark is the cost of the JVM 
critical section (GetPrimitiveArrayCritical). While in such a critical 
section, GCs are blocked, and any request to start a GC will end up blocking 
all threads until the thread within the critical section exits. This can affect 
GC pause time pretty greatly if you are making CRC calls of large buffers. This 
was one of the major reasons to switch to the pure Java CRC32 if I recall 
correctly -- not just pure throughput.

The new work that [~james.thomas] is doing with native CRC avoids the above 
problem by chunking the CRC calculation into smaller chunks -- the same trick 
that the JVM uses when memcpying large byte[] arrays. This avoids long critical 
sections and the above-mentioned problem where all threads block while entering 
a minor GC.

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

Alejandro Abdelnur created HADOOP-10791:
---

 Summary: AuthenticationFilter should support externalizing the 
secret for signing and provide rotation support
 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur


It should be possible to externalize the secret used to sign the hadoop-auth 
cookies.

In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
the case of Oozie HA, the secret could be stored in Oozie HA control data in 
ZooKeeper.

In addition, it is desirable for the secret to change periodically, this means 
that the AuthenticationService should remember a previous secret for the max 
duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053908#comment-14053908
 ] 

Colin Patrick McCabe commented on HADOOP-10778:
---

I like the idea of having a microbenchmark which can compare the different 
implementations.  I'm not comfortable selecting the implementation to use at 
runtime with some microbenchmark, because it means that behavior may be 
nondeterministic.  For example, if a GC hits when we're doing the benchmark, or 
another process uses a bunch of CPu, we might choose the wrong implementation.  
As you guys know, most of our users on x86 just use CRC32C (not CRC32) with 
hardware acceleration, and we're not going to beat that from Java or C.  Let's 
either get rid of the fallback mechanism or add a configuration option to make 
it optional.

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-07 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned HADOOP-10791:
--

Assignee: Robert Kanter

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter

 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9902) Shell script rewrite

2014-07-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053928#comment-14053928
 ] 

Hadoop QA commented on HADOOP-9902:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654218/HADOOP-9902-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-assemblies hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4221//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4221//console

This message is automatically generated.

 Shell script rewrite
 

 Key: HADOOP-9902
 URL: https://issues.apache.org/jira/browse/HADOOP-9902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: releasenotes
 Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, 
 HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt


 Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10780) namenode throws java.lang.OutOfMemoryError upon DatanodeProtocol.versionRequest from datanode


[ 
https://issues.apache.org/jira/browse/HADOOP-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053934#comment-14053934
 ] 

Colin Patrick McCabe commented on HADOOP-10780:
---

Thanks for that analysis, Dmitry.  I think you're right.  Can you post a patch 
for us to review?

 namenode throws java.lang.OutOfMemoryError upon 
 DatanodeProtocol.versionRequest from datanode
 -

 Key: HADOOP-10780
 URL: https://issues.apache.org/jira/browse/HADOOP-10780
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.4.1
 Environment: FreeBSD-10/stable
 openjdk version 1.7.0_60
 OpenJDK Runtime Environment (build 1.7.0_60-b19)
 OpenJDK 64-Bit Server VM (build 24.60-b09, mixed mode)
Reporter: Dmitry Sivachenko

 I am trying hadoop-2.4.1 on FreeBSD-10/stable.
 namenode starts up, but after first datanode contacts it, it throws an 
 exception.
 All limits seem to be high enough:
 % limits -a
 Resource limits (current):
   cputime  infinity secs
   filesize infinity kB
   datasize 33554432 kB
   stacksize  524288 kB
   coredumpsize infinity kB
   memoryuseinfinity kB
   memorylocked infinity kB
   maxprocesses   122778
   openfiles  14
   sbsize   infinity bytes
   vmemoryuse   infinity kB
   pseudo-terminals infinity
   swapuse  infinity kB
 14944  1  S0:06.59 /usr/local/openjdk7/bin/java -Dproc_namenode 
 -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop 
 -Dhadoop.log.file=hadoop-hdfs-namenode-nezabudka3-00.log 
 -Dhadoop.home.dir=/usr/local -Dhadoop.id.str=hdfs 
 -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml 
 -Djava.net.preferIPv4Stack=true -Xmx32768m -Xms32768m 
 -Djava.library.path=/usr/local/lib -Xmx32768m -Xms32768m 
 -Djava.library.path=/usr/local/lib -Xmx32768m -Xms32768m 
 -Djava.library.path=/usr/local/lib -Dhadoop.security.logger=INFO,RFAS 
 org.apache.hadoop.hdfs.server.namenode.NameNode
 From the namenode's log:
 2014-07-03 23:28:15,070 WARN  [IPC Server handler 5 on 8020] ipc.Server 
 (Server.java:run(2032)) - IPC Server handler 5 on 8020, call 
 org.apache.hadoop.hdfs.server.protocol.Datano
 deProtocol.versionRequest from 5.255.231.209:57749 Call#842 Retry#0
 java.lang.OutOfMemoryError
 at 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroupsForUser(Native 
 Method)
 at 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroups(JniBasedUnixGroupsMapping.java:80)
 at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
 at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
 at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1417)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.init(FSPermissionChecker.java:81)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3331)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5491)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.versionRequest(NameNodeRpcServer.java:1082)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.versionRequest(DatanodeProtocolServerSideTranslatorPB.java:234)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28069)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 I did not have such an issue with hadoop-1.2.1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-07 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053933#comment-14053933
 ] 

Larry McCay commented on HADOOP-10791:
--

Hi [~tucu00] - I was planning on adding support for the credential provider API 
for this.
What do you have in mind? Am I correct in assuming you mean the secret stored 
in hadoop.http.authentication.signature.secret.file?

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter

 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10781) Unportable getgrouplist() usage breaks FreeBSD


[ 
https://issues.apache.org/jira/browse/HADOOP-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053941#comment-14053941
 ] 

Colin Patrick McCabe commented on HADOOP-10781:
---

Thanks for looking at this, Dmitry.

bq. Because according to manpage this is impossible.

I'm afraid we need to be paranoid when checking the result of this call.  We've 
seen some odd behavior in the past.  man pages may not always be correct, even 
when referring to their own operating system, let alone all possible operating 
systems we support.

Can you post your comments as a patch for us to review?  We can't review JIRA 
comments (or at least we can't commit them to the codebase)

 Unportable getgrouplist() usage breaks FreeBSD
 --

 Key: HADOOP-10781
 URL: https://issues.apache.org/jira/browse/HADOOP-10781
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Dmitry Sivachenko

 getgrouplist() has different return values on Linux and FreeBSD:
 Linux: either the number of groups (positive) or -1 on error
 FreeBSD: 0 on success or -1 on error
 The return value of getgrouplist() is analyzed in Linux-specific way in 
 hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c,
  in function hadoop_user_info_getgroups() which breaks FreeBSD.
 In this function you have 3 choices for the return value 
 ret = getgrouplist(uinfo-pwd.pw_name, uinfo-pwd.pw_gid,
  uinfo-gids, ngroups);
 1) ret  0 : OK for Linux, it will be zero on FreeBSD.  I propose to change 
 this to ret = 0
 2) First condition is false and ret != -1:  impossible according to manpage
 3) ret == 1 -- OK for both Linux and FreeBSD
 So I propose to change ret  0 to ret = 0 and (optionally) return 2nd 
 case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10507) FsShell setfacl can throw ArrayIndexOutOfBoundsException when no perm is specified

2014-07-07 Thread Stephen Chu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053944#comment-14053944
 ] 

Stephen Chu commented on HADOOP-10507:
--

Hi, [~sathish.gurram], do you have time to fix the indentation? If not, I can 
update your patch for you.

 FsShell setfacl can throw ArrayIndexOutOfBoundsException when no perm is 
 specified
 --

 Key: HADOOP-10507
 URL: https://issues.apache.org/jira/browse/HADOOP-10507
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: sathish
Priority: Minor
 Attachments: HDFS-6205-0001.patch, HDFS-6205.patch


 If users don't specify the perm of an acl when using the FsShell's setfacl 
 command, a fatal internal error ArrayIndexOutOfBoundsException will be thrown.
 {code}
 [root@hdfs-nfs ~]# hdfs dfs -setfacl -m user:bob: /user/hdfs/td1
 -setfacl: Fatal internal error
 java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.hadoop.fs.permission.AclEntry.parseAclEntry(AclEntry.java:285)
   at 
 org.apache.hadoop.fs.permission.AclEntry.parseAclSpec(AclEntry.java:221)
   at 
 org.apache.hadoop.fs.shell.AclCommands$SetfaclCommand.processOptions(AclCommands.java:260)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:308)
 [root@hdfs-nfs ~]# 
 {code}
 An improvement would be if it returned something like this:
 {code}
 [root@hdfs-nfs ~]# hdfs dfs -setfacl -m user:bob:rww /user/hdfs/td1
 -setfacl: Invalid permission in aclSpec : user:bob:rww
 Usage: hadoop fs [generic options] -setfacl [-R] [{-b|-k} {-m|-x acl_spec} 
 path]|[--set acl_spec path]
 [root@hdfs-nfs ~]# 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster

2014-07-07 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053937#comment-14053937
 ] 

Todd Lipcon commented on HADOOP-10778:
--

BTW, I had suggested dynamically switching between java and native CRC32 about 
5 years back, but Owen said he'd veto that: 
https://issues.apache.org/jira/browse/HADOOP-6148?focusedCommentId=12721804page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721804

Perhaps he's changed his mind since then :)

The original JIRA also noted that OpenJDK was faster than Oracle's JDK at the 
time. So perhaps that's the difference you're seeing between java 7 and java 6? 
Are you testing OpenJDK in both cases or is it Oracle JDK 6?

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10779) Generalize DFS_PERMISSIONS_SUPERUSERGROUP_KEY for any HCFS


[ 
https://issues.apache.org/jira/browse/HADOOP-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053927#comment-14053927
 ] 

Colin Patrick McCabe commented on HADOOP-10779:
---

Supergroup is an HDFS-specific concept that doesn't appear in most other 
filesystems.  It doesn't make sense to add a generic key, because the concept 
itself is not generic.

UNIX has something vaguely similar to supergroup, in the wheel group or admin 
group.  All members of the wheel group can su to root.  But this is not exactly 
the same as the supergroup concept in HDFS, since no su or sudo is required in 
HDFS.

I would say the superuser / supergroup concept in HDFS is an artifact of the 
desire to develop and run HDFS in userspace without being root.  For 
filesystems such as Lustre that run in kernel-space, this is a non-issue (you 
have to be root to fool with the kernel anyway).

 Generalize DFS_PERMISSIONS_SUPERUSERGROUP_KEY for any HCFS
 --

 Key: HADOOP-10779
 URL: https://issues.apache.org/jira/browse/HADOOP-10779
 Project: Hadoop Common
  Issue Type: Wish
  Components: fs
Reporter: Martin Bukatovic
Priority: Minor

 HDFS has configuration option {{dfs.permissions.superusergroup}} stored in
 {{hdfs-site.xml}} configuration file:
 {noformat}
 property
   namedfs.permissions.superusergroup/name
   valuesupergroup/value
   descriptionThe name of the group of super-users./description
 /property
 {noformat}
 Since we have an option to use alternative Hadoop filesystems (HCFS), there is
 a question how to specify a supergroup in such case.
 Eg. would introducing HCFS option in say {{core-site.xml}} for this as shown
 below make sense?
 {noformat}
 property
   namehcfs.permissions.superusergroup/name
   value${dfs.permissions.superusergroup}/value
   descriptionThe name of the group of super-users./description
 /property
 {noformat}
 Or would you solve it in different way? I would like to at least declare 
 a recommended approach for alternative Hadoop filesystems to follow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support


[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053953#comment-14053953
 ] 

Alejandro Abdelnur commented on HADOOP-10791:
-

[~lmccay], my idea was to break the hadoop-auth {{Signer}} into an 
interface/impl and provide 2 impls in hadoop-auth, random/secret-file. WebHDFS 
would have its own impl that uses the same secret use for block tokens. in 
common we could have one that goes to the credentials provider, sure. And, the 
rotation of secret, if supported, is taken care by the impl itself.

  

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter

 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10785) UnsatisfiedLinkError in cryptocodec tests with OpensslCipher#initContext


[ 
https://issues.apache.org/jira/browse/HADOOP-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053956#comment-14053956
 ] 

Colin Patrick McCabe commented on HADOOP-10785:
---

I'm going to mark this as a duplicate of HADOOP-10735.  Let's continue the 
discussion there.

 UnsatisfiedLinkError in cryptocodec tests with OpensslCipher#initContext
 

 Key: HADOOP-10785
 URL: https://issues.apache.org/jira/browse/HADOOP-10785
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0


 {noformat}
 java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.crypto.OpensslCipher.initContext(II)J
 at org.apache.hadoop.crypto.OpensslCipher.initContext(Native Method)
 at 
 org.apache.hadoop.crypto.OpensslCipher.getInstance(OpensslCipher.java:90)
 at 
 org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec$OpensslAesCtrCipher.init(OpensslAesCtrCryptoCodec.java:73)
 at 
 org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.createEncryptor(OpensslAesCtrCryptoCodec.java:53)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:95)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:79)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HADOOP-10785) UnsatisfiedLinkError in cryptocodec tests with OpensslCipher#initContext


 [ 
https://issues.apache.org/jira/browse/HADOOP-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HADOOP-10785.
---

Resolution: Duplicate

 UnsatisfiedLinkError in cryptocodec tests with OpensslCipher#initContext
 

 Key: HADOOP-10785
 URL: https://issues.apache.org/jira/browse/HADOOP-10785
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0


 {noformat}
 java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.crypto.OpensslCipher.initContext(II)J
 at org.apache.hadoop.crypto.OpensslCipher.initContext(Native Method)
 at 
 org.apache.hadoop.crypto.OpensslCipher.getInstance(OpensslCipher.java:90)
 at 
 org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec$OpensslAesCtrCipher.init(OpensslAesCtrCryptoCodec.java:73)
 at 
 org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.createEncryptor(OpensslAesCtrCryptoCodec.java:53)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:95)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:79)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10782) Typo in DataChecksum classs

2014-07-07 Thread Suresh Srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HADOOP-10782:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
   Status: Resolved  (was: Patch Available)

I committed the change. Thank you [~jingguo] for the patch.

 Typo in DataChecksum classs
 ---

 Key: HADOOP-10782
 URL: https://issues.apache.org/jira/browse/HADOOP-10782
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jingguo Yao
Assignee: Jingguo Yao
Priority: Trivial
 Fix For: 2.5.0

 Attachments: HADOOP-10782.patch

   Original Estimate: 5m
  Remaining Estimate: 5m





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10735) Fall back AESCTRCryptoCodec implementation from OpenSSL to JCE if non native support.


[ 
https://issues.apache.org/jira/browse/HADOOP-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053965#comment-14053965
 ] 

Colin Patrick McCabe commented on HADOOP-10735:
---

So as we discussed on HADOOP-10693, we need some way of providing a fallback to 
the JCE implementation when the openssl one is not available.  It's also very 
important that we have a way of ensuring that the unit tests on Jenkins fail 
when openssl is not available.

 Fall back AESCTRCryptoCodec implementation from OpenSSL to JCE if non native 
 support.
 -

 Key: HADOOP-10735
 URL: https://issues.apache.org/jira/browse/HADOOP-10735
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)


 If there is no native support or OpenSSL version is too low not supporting 
 AES-CTR, but {{OpenSSLAESCTRCryptoCodec}} is configured, we need to fall back 
 it to JCE implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10782) Typo in DataChecksum classs


[ 
https://issues.apache.org/jira/browse/HADOOP-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053974#comment-14053974
 ] 

Hudson commented on HADOOP-10782:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5833 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5833/])
HADOOP-10782. Fix typo in DataChecksum class. Contributed by Jingguo Yao. 
(suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608539)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java


 Typo in DataChecksum classs
 ---

 Key: HADOOP-10782
 URL: https://issues.apache.org/jira/browse/HADOOP-10782
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jingguo Yao
Assignee: Jingguo Yao
Priority: Trivial
 Fix For: 2.5.0

 Attachments: HADOOP-10782.patch

   Original Estimate: 5m
  Remaining Estimate: 5m





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-07 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053983#comment-14053983
 ] 

Larry McCay commented on HADOOP-10791:
--

So, how does the signature get validated if it is a randomized secret? It has 
to be stored somewhere, no?
If the random impl eliminates storing clear text secrets for this then we may 
not need the credential api impl after all.

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter

 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support


[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054009#comment-14054009
 ] 

Alejandro Abdelnur commented on HADOOP-10791:
-

the signer implementation would keep it.

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter

 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10786) Patch that fixes UGI#reloginFromKeytab on java 8

2014-07-07 Thread Tobi Vollebregt (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobi Vollebregt updated HADOOP-10786:
-

Description: 
Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and 
storeKey are specified, then only a KeyTab object is added to the Subject's 
private credentials, whereas in java = 7 both a KeyTab and some number of 
KerberosKey objects were added.

The UGI constructor checks whether or not a keytab was used to login by looking 
if there are any KerberosKey objects in the Subject's private credentials. If 
there are, then isKeyTab is set to true, and otherwise it's set to false.

Thus, in java 8 isKeyTab is always false given the current UGI implementation, 
which makes UGI#reloginFromKeytab fail silently.

Attached patch will check for a KeyTab object on the Subject, instead of a 
KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java 8, 
and works on Oracle java 7 as well.

  was:
Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and 
storeKey are specified, then only a KeyTab object is added to the Subject's 
private credentials, whereas in java = 7 both a KeyTab and some number of 
KerberosKey objects was added.

The UGI constructor checks whether or not a keytab was used to login by looking 
if there are any KerberosKey objects in the Subject's private credentials. If 
there are, the isKeyTab is set to true, and otherwise it's false.

Thus, in java 8 isKeyTab is always false given the current UGI implementation, 
which makes UGI#reloginFromKeytab fail silently.

Attached patch will check for a KeyTab object on the Subject, instead of a 
KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java 8, 
and works on Oracle java 7 as well.


 Patch that fixes UGI#reloginFromKeytab on java 8
 

 Key: HADOOP-10786
 URL: https://issues.apache.org/jira/browse/HADOOP-10786
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Tobi Vollebregt
Priority: Minor
 Attachments: HADOOP-10786.patch


 Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and 
 storeKey are specified, then only a KeyTab object is added to the Subject's 
 private credentials, whereas in java = 7 both a KeyTab and some number of 
 KerberosKey objects were added.
 The UGI constructor checks whether or not a keytab was used to login by 
 looking if there are any KerberosKey objects in the Subject's private 
 credentials. If there are, then isKeyTab is set to true, and otherwise it's 
 set to false.
 Thus, in java 8 isKeyTab is always false given the current UGI 
 implementation, which makes UGI#reloginFromKeytab fail silently.
 Attached patch will check for a KeyTab object on the Subject, instead of a 
 KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java 
 8, and works on Oracle java 7 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054053#comment-14054053
 ] 

Tsz Wo Nicholas Sze commented on HADOOP-10778:
--

 ... Have you had a chance to try it on eg a Sandy Bridge server running RHEL 
 6?

No.  Could you help running it?

 The new work that James Thomas is doing with native CRC avoids the above 
 problem by chunking the CRC calculation into smaller chunks – the same trick 
 that the JVM uses when memcpying large byte[] arrays. ...

Sound like a good idea.  Do you think the same trick could be used with 
java.util.zip.CRC32?

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10792) Add FileSystem#closeIfNotReferred method

2014-07-07 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created HADOOP-10792:
---

 Summary: Add FileSystem#closeIfNotReferred method
 Key: HADOOP-10792
 URL: https://issues.apache.org/jira/browse/HADOOP-10792
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Kousuke Saruta


FileSystem#close closes FileSystem even if the same instance of FileSystem is 
referred by someone.

For instance, a library using FileSystem calls FileSystem.get, and a program 
using the library calls FileSystem.get, both of instances of FileSystem is 
same. 

When the library and the program is implemented as different threads and one 
calls FileSystem.close, another fails most of operations of FileSystem.

So, we need the method like cloesIfNotReferred, which closes FileSystem only if 
a instance of FileSystem is not referred by anyone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10734) Implementation of true secure random with high performance using hardware random number generator.


[ 
https://issues.apache.org/jira/browse/HADOOP-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054056#comment-14054056
 ] 

Colin Patrick McCabe commented on HADOOP-10734:
---

[~tucu00]: I agree with Yi's idea of making this a separate class; coupling it 
with CryptoCodec would be confusing.  Although they both use the openssl 
library, they use different C functions.  For example, these functions are not 
needed for the crypto codec stuff, only for the random stuff:

{code}
+  LOAD_DYNAMIC_SYMBOL(dlsym_ENGINE_finish, env, openssl, ENGINE_finish);
+  LOAD_DYNAMIC_SYMBOL(dlsym_ENGINE_free, env, openssl, ENGINE_free);
+  LOAD_DYNAMIC_SYMBOL(dlsym_ENGINE_cleanup, env, openssl, ENGINE_cleanup);
+  LOAD_DYNAMIC_SYMBOL(dlsym_RAND_bytes, env, openssl, RAND_bytes);
+  LOAD_DYNAMIC_SYMBOL(dlsym_ERR_get_error, env, openssl, ERR_get_error);
{code}

bq. \[Yi wrote\]: I agree it’s not good to test true random numbers in this 
way. I try to loop until rand2 is not equal to rand1, but then we need to 
Assert something, your suggestion is?

I was just suggesting looping until they're not equal.  This catches the case 
where it's always returning a constant value (it will timeout).  So I don't see 
why we need to assert something.

There definitely are more sophisticated tests for randomness out there, but 
that would require a bit of research and might be best to do in another JIRA, 
if we do it.

{code}
+static unsigned long pthreads_thread_id(void)
+{
+  return (unsigned long)pthread_self();
+}
{code}

This is still wrong.  If you don't want to use gettid, you can use some code 
like this:

{code}
pthread_key_t key;
unsigned long highest_thread_id;

static unsigned long pthreads_thread_id(void)
{
  void *v;
  unsigned long id;

  v = pthread_getspecific(key);
  if (v) {
return (unsigned long)(uintptr_t)v;
  }
  id = __add_and_fetch(highest_thread_id, 1);
  pthread_setspecific(key, (void*)id);
  return id;
}
{code}

You'll need to manage setting up and tearing down the {{pthread_key_t}} as well.

 Implementation of true secure random with high performance using hardware 
 random number generator.
 --

 Key: HADOOP-10734
 URL: https://issues.apache.org/jira/browse/HADOOP-10734
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10734.1.patch, HADOOP-10734.2.patch, 
 HADOOP-10734.patch


 This JIRA is to implement Secure random using JNI to OpenSSL, and 
 implementation should be thread-safe.
 Utilize RdRand to return random numbers from hardware random number 
 generator. It's TRNG(True Random Number generators) having much higher 
 performance than {{java.security.SecureRandom}}. 
 https://wiki.openssl.org/index.php/Random_Numbers
 http://en.wikipedia.org/wiki/RdRand
 https://software.intel.com/en-us/articles/performance-impact-of-intel-secure-key-on-openssl



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054057#comment-14054057
 ] 

Tsz Wo Nicholas Sze commented on HADOOP-10778:
--

  ... For example, if a GC hits when we're doing the benchmark, or another 
 process uses a bunch of CPu, we might choose the wrong implementation. ...

Benchmark is reliable only if it is run for many times and the results are 
consistent.  It is not like the we run it one time and use the result to change 
the code.

  As you guys know, most of our users on x86 just use CRC32C (not CRC32) ...

This JIRA has nothing to do with CRC32C.

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10778) Use NativeCrc32 only if it is faster


[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054092#comment-14054092
 ] 

Tsz Wo Nicholas Sze commented on HADOOP-10778:
--

 The original JIRA also noted that OpenJDK was faster than Oracle's JDK at the 
 time. So perhaps that's the difference you're seeing between java 7 and java 
 6? Are you testing OpenJDK in both cases or is it Oracle JDK 6?

From the java.vm.vendor property, the Java 6 and Java 7 I used are 
respectively Apple and Oracle.  Please help running the benchmark with 
different hardware and JDKs.

 Use NativeCrc32 only if it is faster
 

 Key: HADOOP-10778
 URL: https://issues.apache.org/jira/browse/HADOOP-10778
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: c10778_20140702.patch


 From the benchmark post in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
 bytesPerChecksum  512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10741) A lightweight WebHDFS client library

[
https://issues.apache.org/jira/browse/HADOOP-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054155#comment-14054155
]

Tsz Wo Nicholas Sze commented on HADOOP-10741:
--

[~andrew.wang], this JIRA has nothing to do with standardizing the FileSystem
API. We are not suggesting changing FileSystem API.

At this point, if we use FileSystem API, it won't be lightweight. So we
suggest adding another lightweight API. This API is for the users who do not
want to use FileSystem API.

A lightweight WebHDFS client library

Key: HADOOP-10741
URL: https://issues.apache.org/jira/browse/HADOOP-10741
Project: Hadoop Common
Issue Type: New Feature
Components: tools
Reporter: Tsz Wo Nicholas Sze
Assignee: Mohammad Kamrul Islam

One of the motivations for creating WebHDFS is for applications connecting to
HDFS from outside the cluster. In order to do so, users have to either
# install Hadoop and use WebHdfsFileSsytem, or
# develop their own client using the WebHDFS REST API.
For #1, it is very difficult to manage and unnecessarily complicated for
other applications since Hadoop is not a lightweight library. For #2, it is
not easy to deal with security and handle transient errors.
Therefore, we propose adding a lightweight WebHDFS client as a separated
library which does not depend on Common and HDFS. The client can be packaged
as a standalone jar. Other applications simply add the jar to their
classpath for using it.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10750) KMSKeyProviderCache should be in hadoop-common

2014-07-07 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HADOOP-10750:
-

Attachment: HADOOP-10750.1.patch

Uploaded Patch :
* Moved {{KMSCacheKeyProvider}} to hadoop-common {{CachingKeyProvider}}. This 
class implements the {{KeyProviderExtension}} abstract class which, by default, 
delegates all methods to the underlying {{KeyProvider}} and over-rides the get 
KeyVersion/CurrentVersion/Metadata methods to delegate to a inner static 
{{CacheExtension}}

 KMSKeyProviderCache should be in hadoop-common
 --

 Key: HADOOP-10750
 URL: https://issues.apache.org/jira/browse/HADOOP-10750
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
 Attachments: HADOOP-10750.1.patch


 KMS has {{KMSCacheKeyProvider}}, this class should be available in 
 hadoop-common for users of  {{KeyProvider}} instances to wrap them and avoid 
 several, potentially expensive, key retrievals.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation

2014-07-07 Thread Jordan Mendelson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054309#comment-14054309
 ] 

Jordan Mendelson commented on HADOOP-10400:
---

Sorry all, was off for the last month. The upstream version of this code has a 
few small changes for preliminary retry code to deal with connection closed 
exceptions during read() (which are sadly being exposed as raw Apache httpcore 
exceptions derived from IOException). I wanted to use the retry logic that aws 
itself provides, but it doesn't appear there is a particularly clean way of 
doing it. As soon as it is done in a semi-sane way, I'll put up a new patch.

I'm currently integrating all the patches that appear here. Thanks so much for 
all the contributions! Most should already been in the upstream project 
[https://github.com/Aloisius/hadoop-s3a]. The server side encryption one 
probably needs a better key name, but I can't think of one that better conforms 
to whatever the current style Hadoop is using right now.

I'm currently integrating all the other changes there are patches are here for. 
It is a bit unwieldy to keep track of all these patches to this patch, 
reintegrate it into my upstream and then recreate a new patch for hadoop trunk 
each time. If anyone has any suggestions on making this easier, please let me 
know. The only reason I keep an upstream version is because I use it in 
production with CDH.

 Incorporate new S3A FileSystem implementation
 -

 Key: HADOOP-10400
 URL: https://issues.apache.org/jira/browse/HADOOP-10400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs, fs/s3
Affects Versions: 2.4.0
Reporter: Jordan Mendelson
Assignee: Jordan Mendelson
 Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, 
 HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch, 
 HADOOP-10400-6.patch


 The s3native filesystem has a number of limitations (some of which were 
 recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
 the aws-sdk instead of the jets3t library. There are a number of improvements 
 over s3native including:
 - Parallel copy (rename) support (dramatically speeds up commits on large 
 files)
 - AWS S3 explorer compatible empty directories files xyz/ instead of 
 xyz_$folder$ (reduces littering)
 - Ignores s3native created _$folder$ files created by s3native and other S3 
 browsing utilities
 - Supports multiple output buffer dirs to even out IO when uploading files
 - Supports IAM role-based authentication
 - Allows setting a default canned ACL for uploads (public, private, etc.)
 - Better error recovery handling
 - Should handle input seeks without having to download the whole file (used 
 for splits a lot)
 This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
 various pom files to get it to build against trunk. I've been using 0.0.1 in 
 production with CDH 4 for several months and CDH 5 for a few days. The 
 version here is 0.0.2 which changes around some keys to hopefully bring the 
 key name style more inline with the rest of hadoop 2.x.
 *Tunable parameters:*
 fs.s3a.access.key - Your AWS access key ID (omit for role authentication)
 fs.s3a.secret.key - Your AWS secret key (omit for role authentication)
 fs.s3a.connection.maximum - Controls how many parallel connections 
 HttpClient spawns (default: 15)
 fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 
 (default: true)
 fs.s3a.attempts.maximum - How many times we should retry commands on 
 transient errors (default: 10)
 fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
 fs.s3a.paging.maximum - How many keys to request from S3 when doing 
 directory listings at a time (default: 5000)
 fs.s3a.multipart.size - How big (in bytes) to split a upload or copy 
 operation up into (default: 104857600)
 fs.s3a.multipart.threshold - Until a file is this large (in bytes), use 
 non-parallel upload (default: 2147483647)
 fs.s3a.acl.default - Set a canned ACL on newly created/copied objects 
 (private | public-read | public-read-write | authenticated-read | 
 log-delivery-write | bucket-owner-read | bucket-owner-full-control)
 fs.s3a.multipart.purge - True if you want to purge existing multipart 
 uploads that may not have been completed/aborted correctly (default: false)
 fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads 
 to purge (default: 86400)
 fs.s3a.buffer.dir - Comma separated list of directories that will be used 
 to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a )
 *Caveats*:
 Hadoop uses a standard output committer which uploads files as 
 filename.COPYING before renaming them. This can cause unnecessary

[jira] [Commented] (HADOOP-10776) Open up Delegation token fetching and renewal to STORM (Possibly others)

2014-07-07 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054310#comment-14054310
 ] 

Chris Nauroth commented on HADOOP-10776:


Hi, [~revans2].  I am +1 for the proposal to make the necessary APIs public.  I 
think it's the practical choice at this point.  If we consider the example of 
{{FileSystem#addDelegationTokens}}, the method was added 2 years ago for 
2.0.2-alpha, and the signature has not changed since then.  That indicates 
stability.  I also know that other projects have called this method despite the 
limited-private risk, so that's another sign that there is a general need for a 
public interface for using delegation tokens.

 Open up Delegation token fetching and renewal to STORM (Possibly others)
 

 Key: HADOOP-10776
 URL: https://issues.apache.org/jira/browse/HADOOP-10776
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Robert Joseph Evans

 Storm would like to be able to fetch delegation tokens and forward them on to 
 running topologies so that they can access HDFS (STORM-346).  But to do so we 
 need to open up access to some of APIs. 
 Most notably FileSystem.addDelegationTokens(), Token.renew, 
 Credentials.getAllTokens, and UserGroupInformation but there may be others.
 At a minimum adding in storm to the list of allowed API users. But ideally 
 making them public. Restricting access to such important functionality to 
 just MR really makes secure HDFS inaccessible to anything except MR, or tools 
 that reuse MR input formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation

2014-07-07 Thread Jordan Mendelson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054321#comment-14054321
 ] 

Jordan Mendelson commented on HADOOP-10400:
---

Also [~ste...@apache.org], should I create my next patch on top of your 
hadoop-amazon?

 Incorporate new S3A FileSystem implementation
 -

 Key: HADOOP-10400
 URL: https://issues.apache.org/jira/browse/HADOOP-10400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs, fs/s3
Affects Versions: 2.4.0
Reporter: Jordan Mendelson
Assignee: Jordan Mendelson
 Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, 
 HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch, 
 HADOOP-10400-6.patch


 The s3native filesystem has a number of limitations (some of which were 
 recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
 the aws-sdk instead of the jets3t library. There are a number of improvements 
 over s3native including:
 - Parallel copy (rename) support (dramatically speeds up commits on large 
 files)
 - AWS S3 explorer compatible empty directories files xyz/ instead of 
 xyz_$folder$ (reduces littering)
 - Ignores s3native created _$folder$ files created by s3native and other S3 
 browsing utilities
 - Supports multiple output buffer dirs to even out IO when uploading files
 - Supports IAM role-based authentication
 - Allows setting a default canned ACL for uploads (public, private, etc.)
 - Better error recovery handling
 - Should handle input seeks without having to download the whole file (used 
 for splits a lot)
 This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
 various pom files to get it to build against trunk. I've been using 0.0.1 in 
 production with CDH 4 for several months and CDH 5 for a few days. The 
 version here is 0.0.2 which changes around some keys to hopefully bring the 
 key name style more inline with the rest of hadoop 2.x.
 *Tunable parameters:*
 fs.s3a.access.key - Your AWS access key ID (omit for role authentication)
 fs.s3a.secret.key - Your AWS secret key (omit for role authentication)
 fs.s3a.connection.maximum - Controls how many parallel connections 
 HttpClient spawns (default: 15)
 fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 
 (default: true)
 fs.s3a.attempts.maximum - How many times we should retry commands on 
 transient errors (default: 10)
 fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
 fs.s3a.paging.maximum - How many keys to request from S3 when doing 
 directory listings at a time (default: 5000)
 fs.s3a.multipart.size - How big (in bytes) to split a upload or copy 
 operation up into (default: 104857600)
 fs.s3a.multipart.threshold - Until a file is this large (in bytes), use 
 non-parallel upload (default: 2147483647)
 fs.s3a.acl.default - Set a canned ACL on newly created/copied objects 
 (private | public-read | public-read-write | authenticated-read | 
 log-delivery-write | bucket-owner-read | bucket-owner-full-control)
 fs.s3a.multipart.purge - True if you want to purge existing multipart 
 uploads that may not have been completed/aborted correctly (default: false)
 fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads 
 to purge (default: 86400)
 fs.s3a.buffer.dir - Comma separated list of directories that will be used 
 to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a )
 *Caveats*:
 Hadoop uses a standard output committer which uploads files as 
 filename.COPYING before renaming them. This can cause unnecessary performance 
 issues with S3 because it does not have a rename operation and S3 already 
 verifies uploads against an md5 that the driver sets on the upload request. 
 While this FileSystem should be significantly faster than the built-in 
 s3native driver because of parallel copy support, you may want to consider 
 setting a null output committer on our jobs to further improve performance.
 Because S3 requires the file length and MD5 to be known before a file is 
 uploaded, all output is buffered out to a temporary file first similar to the 
 s3native driver.
 Due to the lack of native rename() for S3, renaming extremely large files or 
 directories make take a while. Unfortunately, there is no way to notify 
 hadoop that progress is still being made for rename operations, so your job 
 may time out unless you increase the task timeout.
 This driver will fully ignore _$folder$ files. This was necessary so that it 
 could interoperate with repositories that have had the s3native driver used 
 on them, but means that it won't recognize empty directories that s3native 
 has been used on.
 Statistics for the filesystem may be calculated

[jira] [Updated] (HADOOP-10750) KMSKeyProviderCache should be in hadoop-common