[jira] [Updated] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs

2014-05-27 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HADOOP-10557:
---

Attachment: HADOOP-10557.2.patch

Thanks [~jira.shegalov] for the comment! Updated the patch.

 FsShell -cp -p does not preserve extended ACLs
 --

 Key: HADOOP-10557
 URL: https://issues.apache.org/jira/browse/HADOOP-10557
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: HADOOP-10557.2.patch, HADOOP-10557.patch


 This issue tracks enhancing FsShell cp to
 * preserve extended ACLs by -p option
 or
 * add a new command-line option for preserving extended ACLs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon

2014-05-27 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HADOOP-10623:
---

Attachment: HADOOP-10623.v03.patch

v03: adding the option -loadSites to add *-site.xml in one shot.

 Provide a utility to be able inspect the config as seen by a hadoop client 
 daemon 
 --

 Key: HADOOP-10623
 URL: https://issues.apache.org/jira/browse/HADOOP-10623
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch, 
 HADOOP-10623.v03.patch


 To ease debugging of config issues it is convenient to be able to generate a 
 config as seen by the job client or a hadoop daemon
 {noformat}
 ]$ hadoop org.apache.hadoop.util.ConfigTool -help 
 Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ]
   if resource contains '/', load from local filesystem
   otherwise, load from the classpath
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
 {noformat}
 {noformat}
 $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml 
 ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python 
 -mjson.tool
 {
 properties: [
 {
 isFinal: false,
 key: mapreduce.framework.name,
 resource: mapred-site.xml,
 value: yarn
 },
 {
 isFinal: false,
 key: mapreduce.client.genericoptionsparser.used,
 resource: programatically,
 value: true
 },
 {
 isFinal: false,
 key: my.test.conf,
 resource: from command line,
 value: val
 },
 {
 isFinal: false,
 key: from.file.key,
 resource: 
 hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml,
 value: from.file.val
 },
 {
 isFinal: false,
 key: mapreduce.shuffle.port,
 resource: mapred-site.xml,
 value: ${my.mapreduce.shuffle.port}
 }
 ]
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009483#comment-14009483
 ] 

Hadoop QA commented on HADOOP-10623:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12646849/HADOOP-10623.v03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3973//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3973//console

This message is automatically generated.

 Provide a utility to be able inspect the config as seen by a hadoop client 
 daemon 
 --

 Key: HADOOP-10623
 URL: https://issues.apache.org/jira/browse/HADOOP-10623
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch, 
 HADOOP-10623.v03.patch


 To ease debugging of config issues it is convenient to be able to generate a 
 config as seen by the job client or a hadoop daemon
 {noformat}
 ]$ hadoop org.apache.hadoop.util.ConfigTool -help 
 Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ]
   if resource contains '/', load from local filesystem
   otherwise, load from the classpath
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
 {noformat}
 {noformat}
 $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml 
 ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python 
 -mjson.tool
 {
 properties: [
 {
 isFinal: false,
 key: mapreduce.framework.name,
 resource: mapred-site.xml,
 value: yarn
 },
 {
 isFinal: false,
 key: mapreduce.client.genericoptionsparser.used,
 resource: programatically,
 value: true
 },
 {
 isFinal: false,
 key: my.test.conf,
 resource: from command line,
 value: val
 },
 {
 isFinal: false,
 key: from.file.key,
 resource: 
 hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml,
 value: from.file.val
 },
 {
 isFinal: false,
 key: mapreduce.shuffle.port,
 resource: mapred-site.xml,
 value: ${my.mapreduce.shuffle.port}
 }
 ]
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009514#comment-14009514
 ] 

Hadoop QA commented on HADOOP-10557:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646840/HADOOP-10557.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3972//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3972//console

This message is automatically generated.

 FsShell -cp -p does not preserve extended ACLs
 --

 Key: HADOOP-10557
 URL: https://issues.apache.org/jira/browse/HADOOP-10557
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: HADOOP-10557.2.patch, HADOOP-10557.patch


 This issue tracks enhancing FsShell cp to
 * preserve extended ACLs by -p option
 or
 * add a new command-line option for preserving extended ACLs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol

2014-05-27 Thread Chris Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Li updated HADOOP-10376:
--

Attachment: HADOOP-10376.patch

Hi [~arpitagarwal], sounds good. I went ahead and uploaded a patch.

Most of it is pretty typical stuff for adding a new protocol (which shows how 
painful it is today), the interesting parts are the 3 new files: 
RefreshRegistry, RefreshHandler, RefreshResponse.

A useful new capability is being able to send text and exit status to the user 
on success (today you can either return 0 and have no text, or throw an 
exception with a message and return -1)

Authorization is coarse in this patch: users can be opted in or out of 
refreshing any of the registered refresh handlers. Future versions would allow 
more fine permissions.

 Refactor refresh*Protocols into a single generic refreshConfigProtocol
 --

 Key: HADOOP-10376
 URL: https://issues.apache.org/jira/browse/HADOOP-10376
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chris Li
Assignee: Chris Li
Priority: Minor
 Attachments: HADOOP-10376.patch, RefreshFrameworkProposal.pdf


 See https://issues.apache.org/jira/browse/HADOOP-10285
 There are starting to be too many refresh*Protocols We can refactor them to 
 use a single protocol with a variable payload to choose what to do.
 Thereafter, we can return an indication of success or failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-3845) equals() method in GenericWritable

2014-05-27 Thread jhanver chand sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009599#comment-14009599
 ] 

jhanver chand sharma commented on HADOOP-3845:
--

please review..

 equals() method in GenericWritable
 --

 Key: HADOOP-3845
 URL: https://issues.apache.org/jira/browse/HADOOP-3845
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: yskhoo
 Attachments: Hadoop-3845.patch


 Missing equals() and hash() methods in GenericWritable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10590) ServiceAuthorizationManager is not threadsafe

2014-05-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009647#comment-14009647
 ] 

Daryn Sharp commented on HADOOP-10590:
--

Agreed on correct vs performance.  Any update on performance though?  My 
concern relates to large clusters seeing bursts of thousands of connections per 
second.  I wouldn't expect much if any measurable impact, but stranger things 
have happened.  The test would need to stress 1 rpc connections to avoid 
diluting the results.

 ServiceAuthorizationManager  is not threadsafe
 --

 Key: HADOOP-10590
 URL: https://issues.apache.org/jira/browse/HADOOP-10590
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HADOOP-10590.patch


 The mutators in ServiceAuthorizationManager  are synchronized. The accessors 
 are not synchronized.
 This results in visibility issues when  ServiceAuthorizationManager's state 
 is accessed from different threads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10626) Limit Returning Attributes for LDAP search

2014-05-27 Thread Jason Hubbard (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009668#comment-14009668
 ] 

Jason Hubbard commented on HADOOP-10626:


The only useful test I could think to add was an integration test, but there is 
no current support for ldap integration that I am aware.  The integration test 
would test to make sure group name attribute was returned as well as user full 
dn.  I have tested this manually and all groups for users were successfully 
returned.

 Limit Returning Attributes for LDAP search
 --

 Key: HADOOP-10626
 URL: https://issues.apache.org/jira/browse/HADOOP-10626
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.3.0
Reporter: Jason Hubbard
  Labels: easyfix, newbie, performance
 Attachments: HADOOP-10626.patch


 When using Hadoop Ldap Group mappings in an enterprise environment, searching 
 groups and returning all members can take a long time causing a timeout.  
 This causes not all groups to be returned for a user.  Because the first 
 search only searches for the user dn and the second search retrieves the 
 group member attribute, we only need to return the group member attribute on 
 the search speeding up the search.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number

2014-05-27 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009828#comment-14009828
 ] 

Alejandro Abdelnur commented on HADOOP-10611:
-

Owen, I'm ok with some implementations choosing to use NAME@COUNTER as the 
version ID. However, I don' think we can mandate that, specially when 
integrating with 3rd party key management solutions which generate their own 
opaque key version IDs.

The purpose of this JIRA is to ensure that KeyProvider/KeyShell/KMS don't 
assume the version ID is NAME@COUNTER, rather threat it as an opaque value.

Regarding the existing methods in the public API, such as 
{{buildVersionName()}}, I would propose moving them to a {{KeyProviderUtils}} 
class for KeyProvider implementations that choose to use the NAME@COUNTER 
version ID format.

 KeyVersion name should not be assumed to be the 'key name @ the version 
 number
 ---

 Key: HADOOP-10611
 URL: https://issues.apache.org/jira/browse/HADOOP-10611
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 The KeyProvider public API should treat keyversion name as an opaque value. 
 Same for the KMS client/server.
 Methods like {{KeyProvider#buildVersionName()}} and 
 {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009861#comment-14009861
 ] 

Owen O'Malley commented on HADOOP-10607:


Larry, some comments:
* please change CredShell to CredentialShell
* in CredShell.promptForCredential you clobber the array before returning it.
* it would be really nice for CredShell to have more unit tests. I'm not quite 
sure how to get there.


 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-27 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009915#comment-14009915
 ] 

Larry McCay commented on HADOOP-10607:
--

Will do, [~owen.omalley] thanks for the review.
Good catch on the array problem - I'll try and and a unit test for that as well!

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-27 Thread Babak Behzad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babak Behzad updated HADOOP-9704:
-

Attachment: Hadoop-9704.patch

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-27 Thread Babak Behzad (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009925#comment-14009925
 ] 

Babak Behzad commented on HADOOP-9704:
--

We need this feature in our company and I got the patch and used it. There was 
a small bug which caused Graphite to ignore a lot of metrics (some of the 
metrics have space in their name and Graphite do not handle them). I fixed that 
and I also fixed all the three feedbacks that [~vicaya] mentioned in the above 
comment. Can someone please review this patch before I submit it?

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-27 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009963#comment-14009963
 ] 

Alex Newman commented on HADOOP-9704:
-

+1

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-27 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010098#comment-14010098
 ] 

Ravi Prakash commented on HADOOP-9704:
--

What would happen if the Graphite server at the end of the socket was slow? 
BTW, we see this all the time.



 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-27 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010122#comment-14010122
 ] 

Ravi Prakash commented on HADOOP-9704:
--

Also, would I open a new connection to the same graphite server for each metric?

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010301#comment-14010301
 ] 

Owen O'Malley commented on HADOOP-10607:


Larry, here are some additional comments:
* CredentialEntry.toString should assume use the characters as-is rather than 
printing the hex.
*  I'd suggest removing getCredentialEntryFromConfigValue. I think we can have 
a better backwards compatibility story.
** create an IdentityProvider that returns the alias as the password.
** make IdentityCredentialProvider the default

Thus, hive-site.xml can use javax.jdo.option.ConnectionPassword as mysecret 
and the default of the IdentityCredentialProvider will return mysecret as the 
password. When the user updates their provider to a more secure alternative, 
they would change mysecret to hive-db-password and set the password in 
their provider for hive-db-password.

Does that sound reasonable?


 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Jing Zhao (JIRA)
Jing Zhao created HADOOP-10630:
--

 Summary: Possible race condition in RetryInvocationHandler
 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao


In one of our system tests with NameNode HA setup, we ran 300 threads in 
LoadGenerator. While one of the NameNodes was already in the active state and 
started to serve, we still saw one of the client thread failed all the retries 
in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg 
in the log:
{noformat}
WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
this method invocation attempt.
{noformat}

After checking the code, we see the following code in RetryInvocationHandler:
{code}
  while (true) {
  // The number of times this invocation handler has ever been failed over,
  // before this method invocation attempt. Used to prevent concurrent
  // failed method invocations from triggering multiple failover attempts.
  long invocationAttemptFailoverCount;
  synchronized (proxyProvider) {
invocationAttemptFailoverCount = proxyProviderFailoverCount;
  }
  ..
  if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
// Make sure that concurrent failed method invocations only cause a
// single actual fail over.
synchronized (proxyProvider) {
  if (invocationAttemptFailoverCount == proxyProviderFailoverCount) 
{
proxyProvider.performFailover(currentProxy.proxy);
proxyProviderFailoverCount++;
currentProxy = proxyProvider.getProxy();
  } else {
LOG.warn(A failover has occurred since the start of this 
method
+  invocation attempt.);
  }
}
invocationFailoverCount++;
  }
 ..
{code}

We can see we refresh the value of currentProxy only when the thread performs 
the failover (while holding the monitor of the proxyProvider). Because 
currentProxy is not volatile,  a thread that does not perform the failover 
(in which case it will log the warning msg) may fail to get the new value of 
currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HADOOP-10630:
--

Assignee: Jing Zhao

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao

 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HADOOP-10630:
---

Status: Patch Available  (was: Open)

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HADOOP-10630:
---

Attachment: HADOOP-10630.000.patch

A possible fix is to refresh currentProxy no matter if the failover is 
performed by the current thread or not. Since the process is protected by 
proxyProvider's lock, all the threads should be able to get the current value 
of currentProxy.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-27 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010422#comment-14010422
 ] 

Larry McCay commented on HADOOP-10607:
--

So, you are suggesting that we have a backward compatibility provider that 
always returns the provided alias name as the credential value? In otherwords, 
it is a clear text provider.

I think that I have 2 issues with that:

1. If there are well known alias/credential pairs that are in the credential 
store that don't have configuration elements that they will also just return 
the provided name as the value?
2. There would never be a valid usecase where one configuration element is 
backward compatible clear text and another is an alias that must be resolved? 
Being able to incrementally change them or to be able to test in development 
when adding something new seems valuable.

Essentially, it is a pretty big switch to throw - all or nothing.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010477#comment-14010477
 ] 

Hadoop QA commented on HADOOP-10630:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12647006/HADOOP-10630.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//console

This message is automatically generated.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-27 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010536#comment-14010536
 ] 

Larry McCay commented on HADOOP-10607:
--

I also think that there is value in being able to look at a configuration 
element and know whether it is an alias or a clear text password.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-10625:


Attachment: HADOOP-10625.patch

Moved trim name to handleDeprecation, which covers all get/getRaw calls. Added 
test for getRaw.

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HADOOP-10604) CryptoFileSystem decorator using xAttrs and KeyProvider

2014-05-27 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-10604 started by Yi Liu.

 CryptoFileSystem decorator using xAttrs and KeyProvider
 ---

 Key: HADOOP-10604
 URL: https://issues.apache.org/jira/browse/HADOOP-10604
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)


 A FileSystem implementation that wraps an existing filesystem and provides 
 encryption. It will require the underlying filesystem to support xAttrs. It  
 will use the KeyProvider API to retrieve encryption keys.
 This is mostly the work in the patch HADOOP-10150 minus the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010627#comment-14010627
 ] 

Xuan Gong commented on HADOOP-10625:


+1 LGTM. Will commit it when Jenkins says OK

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010646#comment-14010646
 ] 

Hadoop QA commented on HADOOP-10625:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647021/HADOOP-10625.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common:

org.apache.hadoop.http.TestHttpServer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3976//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3976//console

This message is automatically generated.

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010667#comment-14010667
 ] 

Wangda Tan commented on HADOOP-10625:
-

The timeout test cases should be caused by HADOOP-10289

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Binglin Chang (JIRA)
Binglin Chang created HADOOP-10631:
--

 Summary: Native Hadoop Client: Add missing output in 
GenerateProtobufs.cmake
 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial


In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
make clean is called, those files are not cleaned. 

{code}
 add_custom_command(
OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HADOOP-10631:
---

Affects Version/s: HADOOP-10388

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HADOOP-10631:
---

Status: Patch Available  (was: Open)

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HADOOP-10631:
---

Target Version/s: HADOOP-10388

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HADOOP-10631:
---

Attachment: HADOOP-10631.v1.patch

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010719#comment-14010719
 ] 

Hadoop QA commented on HADOOP-10631:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647044/HADOOP-10631.v1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3977//console

This message is automatically generated.

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-27 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010753#comment-14010753
 ] 

Alejandro Abdelnur commented on HADOOP-10632:
-

Yi, nice work. following some minor comments:

All crypto classes should be annotated as Private has Hadoop is not in the 
business of exposing crypto APIs as an available crypto library.

JCEAESCTREncryptor/JCEAESCTRDecryptor can be merged into a single class taking 
the cipher-mode as constructor param. and the encrypt/decrypt would delegate to 
a single process() method.

AESCTRCryptoCodec#calculateIV() the IV calculation can be done much more 
efficiently doing some byte shifting:

{code}
  private static final int CTR_OFFSET = 8;
...
System.arraycopy(initIV, 0, IV, 0, 8);
long l = (initIV[CTR_OFFSET + 0]  56)
+ ((initIV[CTR_OFFSET + 1]  0xFF)  48)
+ ((initIV[CTR_OFFSET + 2]  0xFF)  40)
+ ((initIV[CTR_OFFSET + 3]  0xFF)  32)
+ ((initIV[CTR_OFFSET + 4]  0xFF)  24)
+ ((initIV[CTR_OFFSET + 5]  0xFF)  16)
+ ((initIV[CTR_OFFSET + 6]  0xFF)  8)
+ (initIV[CTR_OFFSET + 7]  0xFF);
l += counter;
IV[CTR_OFFSET + 0] = (byte) (l  56);
IV[CTR_OFFSET + 1] = (byte) (l  48);
IV[CTR_OFFSET + 2] = (byte) (l  40);
IV[CTR_OFFSET + 3] = (byte) (l  32);
IV[CTR_OFFSET + 4] = (byte) (l  24);
IV[CTR_OFFSET + 5] = (byte) (l  16);
IV[CTR_OFFSET + 6] = (byte) (l  8);
IV[CTR_OFFSET + 7] = (byte) (l);
{code}

CryptoInputStream/CryptoOutputStream, besides the MIN_BUFFER_SIZE check we 
could floor the specified buffer size to a multiple of the 
CryptoCodec#getAlgorithmBlockSize()

CryptoInputStream/CryptoOutputStream, we should clone the key  initIV as well.

CryptoInputStream#read(), no need for doing {{if 
(usingByteBufferRead.booleanValue())}}, just do {{if (usingByteBufferRead)}}, 2 
places.

CryptoInputStream#readFromUnderlyingStream(), it would be more intuitive to 
read if the inBuffer is passed as  parameter.

CryptoInputStream, comment \{@link #org.apache.hadoop.fs.ByteBufferReadable\} 
should not have the '#'

CryptoInputStream#decrypt(long position, ...) method, given that this method 
does not change the current position of the stream, wouldn’t be simpler to 
create a new decryptor and use a different set of input/output buffers without 
touching the stream ones? We could also use instance vars for them and init 
them the first time this method is called (if it is).


 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-27 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010793#comment-14010793
 ] 

Yi Liu commented on HADOOP-10632:
-

Thanks [~tucu00], I will improve them and respond to you later  :-) 

 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation

2014-05-27 Thread Amandeep Khurana (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010806#comment-14010806
 ] 

Amandeep Khurana commented on HADOOP-10400:
---

[~aloisius] - Why do you have the old and new properties in the constants file? 
Why not just a single set of properties? Also, if someone is using S3N and has 
set the credentials for that, should this just pick them up or do you want to 
have users explicitly specify credentials for S3A?

Otherwise, +1 to the following suggestions made earlier:
1. Expose SSE as a config
2. Adding tools/hadoop-aws

This patch is good to go IMO and doesn't need to block on any of the above. All 
the suggestions can be put in as incremental add-ons in subsequent patches. 
However, it'll be nice to have them all committed before the next release.

 Incorporate new S3A FileSystem implementation
 -

 Key: HADOOP-10400
 URL: https://issues.apache.org/jira/browse/HADOOP-10400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Jordan Mendelson
Assignee: Jordan Mendelson
 Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, 
 HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch


 The s3native filesystem has a number of limitations (some of which were 
 recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
 the aws-sdk instead of the jets3t library. There are a number of improvements 
 over s3native including:
 - Parallel copy (rename) support (dramatically speeds up commits on large 
 files)
 - AWS S3 explorer compatible empty directories files xyz/ instead of 
 xyz_$folder$ (reduces littering)
 - Ignores s3native created _$folder$ files created by s3native and other S3 
 browsing utilities
 - Supports multiple output buffer dirs to even out IO when uploading files
 - Supports IAM role-based authentication
 - Allows setting a default canned ACL for uploads (public, private, etc.)
 - Better error recovery handling
 - Should handle input seeks without having to download the whole file (used 
 for splits a lot)
 This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
 various pom files to get it to build against trunk. I've been using 0.0.1 in 
 production with CDH 4 for several months and CDH 5 for a few days. The 
 version here is 0.0.2 which changes around some keys to hopefully bring the 
 key name style more inline with the rest of hadoop 2.x.
 *Tunable parameters:*
 fs.s3a.access.key - Your AWS access key ID (omit for role authentication)
 fs.s3a.secret.key - Your AWS secret key (omit for role authentication)
 fs.s3a.connection.maximum - Controls how many parallel connections 
 HttpClient spawns (default: 15)
 fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 
 (default: true)
 fs.s3a.attempts.maximum - How many times we should retry commands on 
 transient errors (default: 10)
 fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
 fs.s3a.paging.maximum - How many keys to request from S3 when doing 
 directory listings at a time (default: 5000)
 fs.s3a.multipart.size - How big (in bytes) to split a upload or copy 
 operation up into (default: 104857600)
 fs.s3a.multipart.threshold - Until a file is this large (in bytes), use 
 non-parallel upload (default: 2147483647)
 fs.s3a.acl.default - Set a canned ACL on newly created/copied objects 
 (private | public-read | public-read-write | authenticated-read | 
 log-delivery-write | bucket-owner-read | bucket-owner-full-control)
 fs.s3a.multipart.purge - True if you want to purge existing multipart 
 uploads that may not have been completed/aborted correctly (default: false)
 fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads 
 to purge (default: 86400)
 fs.s3a.buffer.dir - Comma separated list of directories that will be used 
 to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a )
 *Caveats*:
 Hadoop uses a standard output committer which uploads files as 
 filename.COPYING before renaming them. This can cause unnecessary performance 
 issues with S3 because it does not have a rename operation and S3 already 
 verifies uploads against an md5 that the driver sets on the upload request. 
 While this FileSystem should be significantly faster than the built-in 
 s3native driver because of parallel copy support, you may want to consider 
 setting a null output committer on our jobs to further improve performance.
 Because S3 requires the file length and MD5 to be known before a file is 
 uploaded, all output is buffered out to a temporary file first similar to the 
 s3native driver.
 Due to the lack of native rename() for S3, renaming extremely large files or 
 directories make