[jira] [Updated] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs
[ https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10557: --- Attachment: HADOOP-10557.2.patch Thanks [~jira.shegalov] for the comment! Updated the patch. FsShell -cp -p does not preserve extended ACLs -- Key: HADOOP-10557 URL: https://issues.apache.org/jira/browse/HADOOP-10557 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HADOOP-10557.2.patch, HADOOP-10557.patch This issue tracks enhancing FsShell cp to * preserve extended ACLs by -p option or * add a new command-line option for preserving extended ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon
[ https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HADOOP-10623: --- Attachment: HADOOP-10623.v03.patch v03: adding the option -loadSites to add *-site.xml in one shot. Provide a utility to be able inspect the config as seen by a hadoop client daemon -- Key: HADOOP-10623 URL: https://issues.apache.org/jira/browse/HADOOP-10623 Project: Hadoop Common Issue Type: New Feature Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch, HADOOP-10623.v03.patch To ease debugging of config issues it is convenient to be able to generate a config as seen by the job client or a hadoop daemon {noformat} ]$ hadoop org.apache.hadoop.util.ConfigTool -help Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ] if resource contains '/', load from local filesystem otherwise, load from the classpath Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {noformat} {noformat} $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python -mjson.tool { properties: [ { isFinal: false, key: mapreduce.framework.name, resource: mapred-site.xml, value: yarn }, { isFinal: false, key: mapreduce.client.genericoptionsparser.used, resource: programatically, value: true }, { isFinal: false, key: my.test.conf, resource: from command line, value: val }, { isFinal: false, key: from.file.key, resource: hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml, value: from.file.val }, { isFinal: false, key: mapreduce.shuffle.port, resource: mapred-site.xml, value: ${my.mapreduce.shuffle.port} } ] } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon
[ https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009483#comment-14009483 ] Hadoop QA commented on HADOOP-10623: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646849/HADOOP-10623.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3973//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3973//console This message is automatically generated. Provide a utility to be able inspect the config as seen by a hadoop client daemon -- Key: HADOOP-10623 URL: https://issues.apache.org/jira/browse/HADOOP-10623 Project: Hadoop Common Issue Type: New Feature Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch, HADOOP-10623.v03.patch To ease debugging of config issues it is convenient to be able to generate a config as seen by the job client or a hadoop daemon {noformat} ]$ hadoop org.apache.hadoop.util.ConfigTool -help Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ] if resource contains '/', load from local filesystem otherwise, load from the classpath Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {noformat} {noformat} $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python -mjson.tool { properties: [ { isFinal: false, key: mapreduce.framework.name, resource: mapred-site.xml, value: yarn }, { isFinal: false, key: mapreduce.client.genericoptionsparser.used, resource: programatically, value: true }, { isFinal: false, key: my.test.conf, resource: from command line, value: val }, { isFinal: false, key: from.file.key, resource: hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml, value: from.file.val }, { isFinal: false, key: mapreduce.shuffle.port, resource: mapred-site.xml, value: ${my.mapreduce.shuffle.port} } ] } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs
[ https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009514#comment-14009514 ] Hadoop QA commented on HADOOP-10557: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646840/HADOOP-10557.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3972//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3972//console This message is automatically generated. FsShell -cp -p does not preserve extended ACLs -- Key: HADOOP-10557 URL: https://issues.apache.org/jira/browse/HADOOP-10557 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HADOOP-10557.2.patch, HADOOP-10557.patch This issue tracks enhancing FsShell cp to * preserve extended ACLs by -p option or * add a new command-line option for preserving extended ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Hi [~arpitagarwal], sounds good. I went ahead and uploaded a patch. Most of it is pretty typical stuff for adding a new protocol (which shows how painful it is today), the interesting parts are the 3 new files: RefreshRegistry, RefreshHandler, RefreshResponse. A useful new capability is being able to send text and exit status to the user on success (today you can either return 0 and have no text, or throw an exception with a message and return -1) Authorization is coarse in this patch: users can be opted in or out of refreshing any of the registered refresh handlers. Future versions would allow more fine permissions. Refactor refresh*Protocols into a single generic refreshConfigProtocol -- Key: HADOOP-10376 URL: https://issues.apache.org/jira/browse/HADOOP-10376 Project: Hadoop Common Issue Type: Improvement Reporter: Chris Li Assignee: Chris Li Priority: Minor Attachments: HADOOP-10376.patch, RefreshFrameworkProposal.pdf See https://issues.apache.org/jira/browse/HADOOP-10285 There are starting to be too many refresh*Protocols We can refactor them to use a single protocol with a variable payload to choose what to do. Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-3845) equals() method in GenericWritable
[ https://issues.apache.org/jira/browse/HADOOP-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009599#comment-14009599 ] jhanver chand sharma commented on HADOOP-3845: -- please review.. equals() method in GenericWritable -- Key: HADOOP-3845 URL: https://issues.apache.org/jira/browse/HADOOP-3845 Project: Hadoop Common Issue Type: Improvement Reporter: yskhoo Attachments: Hadoop-3845.patch Missing equals() and hash() methods in GenericWritable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10590) ServiceAuthorizationManager is not threadsafe
[ https://issues.apache.org/jira/browse/HADOOP-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009647#comment-14009647 ] Daryn Sharp commented on HADOOP-10590: -- Agreed on correct vs performance. Any update on performance though? My concern relates to large clusters seeing bursts of thousands of connections per second. I wouldn't expect much if any measurable impact, but stranger things have happened. The test would need to stress 1 rpc connections to avoid diluting the results. ServiceAuthorizationManager is not threadsafe -- Key: HADOOP-10590 URL: https://issues.apache.org/jira/browse/HADOOP-10590 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.4.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-10590.patch The mutators in ServiceAuthorizationManager are synchronized. The accessors are not synchronized. This results in visibility issues when ServiceAuthorizationManager's state is accessed from different threads. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10626) Limit Returning Attributes for LDAP search
[ https://issues.apache.org/jira/browse/HADOOP-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009668#comment-14009668 ] Jason Hubbard commented on HADOOP-10626: The only useful test I could think to add was an integration test, but there is no current support for ldap integration that I am aware. The integration test would test to make sure group name attribute was returned as well as user full dn. I have tested this manually and all groups for users were successfully returned. Limit Returning Attributes for LDAP search -- Key: HADOOP-10626 URL: https://issues.apache.org/jira/browse/HADOOP-10626 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.3.0 Reporter: Jason Hubbard Labels: easyfix, newbie, performance Attachments: HADOOP-10626.patch When using Hadoop Ldap Group mappings in an enterprise environment, searching groups and returning all members can take a long time causing a timeout. This causes not all groups to be returned for a user. Because the first search only searches for the user dn and the second search retrieves the group member attribute, we only need to return the group member attribute on the search speeding up the search. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10611) KeyVersion name should not be assumed to be the 'key name @ the version number
[ https://issues.apache.org/jira/browse/HADOOP-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009828#comment-14009828 ] Alejandro Abdelnur commented on HADOOP-10611: - Owen, I'm ok with some implementations choosing to use NAME@COUNTER as the version ID. However, I don' think we can mandate that, specially when integrating with 3rd party key management solutions which generate their own opaque key version IDs. The purpose of this JIRA is to ensure that KeyProvider/KeyShell/KMS don't assume the version ID is NAME@COUNTER, rather threat it as an opaque value. Regarding the existing methods in the public API, such as {{buildVersionName()}}, I would propose moving them to a {{KeyProviderUtils}} class for KeyProvider implementations that choose to use the NAME@COUNTER version ID format. KeyVersion name should not be assumed to be the 'key name @ the version number --- Key: HADOOP-10611 URL: https://issues.apache.org/jira/browse/HADOOP-10611 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 3.0.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur The KeyProvider public API should treat keyversion name as an opaque value. Same for the KMS client/server. Methods like {{KeyProvider#buildVersionName()}} and {KeyProvider#getBaseName()}} should not be part of the {{KeyProvider}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009861#comment-14009861 ] Owen O'Malley commented on HADOOP-10607: Larry, some comments: * please change CredShell to CredentialShell * in CredShell.promptForCredential you clobber the array before returning it. * it would be really nice for CredShell to have more unit tests. I'm not quite sure how to get there. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009915#comment-14009915 ] Larry McCay commented on HADOOP-10607: -- Will do, [~owen.omalley] thanks for the review. Good catch on the array problem - I'll try and and a unit test for that as well! Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Babak Behzad updated HADOOP-9704: - Attachment: Hadoop-9704.patch Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009925#comment-14009925 ] Babak Behzad commented on HADOOP-9704: -- We need this feature in our company and I got the patch and used it. There was a small bug which caused Graphite to ignore a lot of metrics (some of the metrics have space in their name and Graphite do not handle them). I fixed that and I also fixed all the three feedbacks that [~vicaya] mentioned in the above comment. Can someone please review this patch before I submit it? Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009963#comment-14009963 ] Alex Newman commented on HADOOP-9704: - +1 Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010098#comment-14010098 ] Ravi Prakash commented on HADOOP-9704: -- What would happen if the Graphite server at the end of the socket was slow? BTW, we see this all the time. Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010122#comment-14010122 ] Ravi Prakash commented on HADOOP-9704: -- Also, would I open a new connection to the same graphite server for each metric? Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010301#comment-14010301 ] Owen O'Malley commented on HADOOP-10607: Larry, here are some additional comments: * CredentialEntry.toString should assume use the characters as-is rather than printing the hex. * I'd suggest removing getCredentialEntryFromConfigValue. I think we can have a better backwards compatibility story. ** create an IdentityProvider that returns the alias as the password. ** make IdentityCredentialProvider the default Thus, hive-site.xml can use javax.jdo.option.ConnectionPassword as mysecret and the default of the IdentityCredentialProvider will return mysecret as the password. When the user updates their provider to a more secure alternative, they would change mysecret to hive-db-password and set the password in their provider for hive-db-password. Does that sound reasonable? Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10630) Possible race condition in RetryInvocationHandler
Jing Zhao created HADOOP-10630: -- Summary: Possible race condition in RetryInvocationHandler Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HADOOP-10630: -- Assignee: Jing Zhao Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HADOOP-10630: --- Status: Patch Available (was: Open) Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HADOOP-10630: --- Attachment: HADOOP-10630.000.patch A possible fix is to refresh currentProxy no matter if the failover is performed by the current thread or not. Since the process is protected by proxyProvider's lock, all the threads should be able to get the current value of currentProxy. Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010422#comment-14010422 ] Larry McCay commented on HADOOP-10607: -- So, you are suggesting that we have a backward compatibility provider that always returns the provided alias name as the credential value? In otherwords, it is a clear text provider. I think that I have 2 issues with that: 1. If there are well known alias/credential pairs that are in the credential store that don't have configuration elements that they will also just return the provided name as the value? 2. There would never be a valid usecase where one configuration element is backward compatible clear text and another is an alias that must be resolved? Being able to incrementally change them or to be able to test in development when adding something new seems valuable. Essentially, it is a pretty big switch to throw - all or nothing. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010477#comment-14010477 ] Hadoop QA commented on HADOOP-10630: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647006/HADOOP-10630.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//console This message is automatically generated. Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010536#comment-14010536 ] Larry McCay commented on HADOOP-10607: -- I also think that there is value in being able to look at a configuration element and know whether it is an alias or a clear text password. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HADOOP-10625: Attachment: HADOOP-10625.patch Moved trim name to handleDeprecation, which covers all get/getRaw calls. Added test for getRaw. Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HADOOP-10604) CryptoFileSystem decorator using xAttrs and KeyProvider
[ https://issues.apache.org/jira/browse/HADOOP-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-10604 started by Yi Liu. CryptoFileSystem decorator using xAttrs and KeyProvider --- Key: HADOOP-10604 URL: https://issues.apache.org/jira/browse/HADOOP-10604 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) A FileSystem implementation that wraps an existing filesystem and provides encryption. It will require the underlying filesystem to support xAttrs. It will use the KeyProvider API to retrieve encryption keys. This is mostly the work in the patch HADOOP-10150 minus the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010627#comment-14010627 ] Xuan Gong commented on HADOOP-10625: +1 LGTM. Will commit it when Jenkins says OK Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010646#comment-14010646 ] Hadoop QA commented on HADOOP-10625: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647021/HADOOP-10625.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common: org.apache.hadoop.http.TestHttpServer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3976//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3976//console This message is automatically generated. Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010667#comment-14010667 ] Wangda Tan commented on HADOOP-10625: - The timeout test cases should be caused by HADOOP-10289 Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
Binglin Chang created HADOOP-10631: -- Summary: Native Hadoop Client: Add missing output in GenerateProtobufs.cmake Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Affects Version/s: HADOOP-10388 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Status: Patch Available (was: Open) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Target Version/s: HADOOP-10388 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Attachment: HADOOP-10631.v1.patch Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010719#comment-14010719 ] Hadoop QA commented on HADOOP-10631: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647044/HADOOP-10631.v1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3977//console This message is automatically generated. Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010753#comment-14010753 ] Alejandro Abdelnur commented on HADOOP-10632: - Yi, nice work. following some minor comments: All crypto classes should be annotated as Private has Hadoop is not in the business of exposing crypto APIs as an available crypto library. JCEAESCTREncryptor/JCEAESCTRDecryptor can be merged into a single class taking the cipher-mode as constructor param. and the encrypt/decrypt would delegate to a single process() method. AESCTRCryptoCodec#calculateIV() the IV calculation can be done much more efficiently doing some byte shifting: {code} private static final int CTR_OFFSET = 8; ... System.arraycopy(initIV, 0, IV, 0, 8); long l = (initIV[CTR_OFFSET + 0] 56) + ((initIV[CTR_OFFSET + 1] 0xFF) 48) + ((initIV[CTR_OFFSET + 2] 0xFF) 40) + ((initIV[CTR_OFFSET + 3] 0xFF) 32) + ((initIV[CTR_OFFSET + 4] 0xFF) 24) + ((initIV[CTR_OFFSET + 5] 0xFF) 16) + ((initIV[CTR_OFFSET + 6] 0xFF) 8) + (initIV[CTR_OFFSET + 7] 0xFF); l += counter; IV[CTR_OFFSET + 0] = (byte) (l 56); IV[CTR_OFFSET + 1] = (byte) (l 48); IV[CTR_OFFSET + 2] = (byte) (l 40); IV[CTR_OFFSET + 3] = (byte) (l 32); IV[CTR_OFFSET + 4] = (byte) (l 24); IV[CTR_OFFSET + 5] = (byte) (l 16); IV[CTR_OFFSET + 6] = (byte) (l 8); IV[CTR_OFFSET + 7] = (byte) (l); {code} CryptoInputStream/CryptoOutputStream, besides the MIN_BUFFER_SIZE check we could floor the specified buffer size to a multiple of the CryptoCodec#getAlgorithmBlockSize() CryptoInputStream/CryptoOutputStream, we should clone the key initIV as well. CryptoInputStream#read(), no need for doing {{if (usingByteBufferRead.booleanValue())}}, just do {{if (usingByteBufferRead)}}, 2 places. CryptoInputStream#readFromUnderlyingStream(), it would be more intuitive to read if the inBuffer is passed as parameter. CryptoInputStream, comment \{@link #org.apache.hadoop.fs.ByteBufferReadable\} should not have the '#' CryptoInputStream#decrypt(long position, ...) method, given that this method does not change the current position of the stream, wouldn’t be simpler to create a new decryptor and use a different set of input/output buffers without touching the stream ones? We could also use instance vars for them and init them the first time this method is called (if it is). Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010793#comment-14010793 ] Yi Liu commented on HADOOP-10632: - Thanks [~tucu00], I will improve them and respond to you later :-) Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation
[ https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010806#comment-14010806 ] Amandeep Khurana commented on HADOOP-10400: --- [~aloisius] - Why do you have the old and new properties in the constants file? Why not just a single set of properties? Also, if someone is using S3N and has set the credentials for that, should this just pick them up or do you want to have users explicitly specify credentials for S3A? Otherwise, +1 to the following suggestions made earlier: 1. Expose SSE as a config 2. Adding tools/hadoop-aws This patch is good to go IMO and doesn't need to block on any of the above. All the suggestions can be put in as incremental add-ons in subsequent patches. However, it'll be nice to have them all committed before the next release. Incorporate new S3A FileSystem implementation - Key: HADOOP-10400 URL: https://issues.apache.org/jira/browse/HADOOP-10400 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Jordan Mendelson Assignee: Jordan Mendelson Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch The s3native filesystem has a number of limitations (some of which were recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses the aws-sdk instead of the jets3t library. There are a number of improvements over s3native including: - Parallel copy (rename) support (dramatically speeds up commits on large files) - AWS S3 explorer compatible empty directories files xyz/ instead of xyz_$folder$ (reduces littering) - Ignores s3native created _$folder$ files created by s3native and other S3 browsing utilities - Supports multiple output buffer dirs to even out IO when uploading files - Supports IAM role-based authentication - Allows setting a default canned ACL for uploads (public, private, etc.) - Better error recovery handling - Should handle input seeks without having to download the whole file (used for splits a lot) This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to various pom files to get it to build against trunk. I've been using 0.0.1 in production with CDH 4 for several months and CDH 5 for a few days. The version here is 0.0.2 which changes around some keys to hopefully bring the key name style more inline with the rest of hadoop 2.x. *Tunable parameters:* fs.s3a.access.key - Your AWS access key ID (omit for role authentication) fs.s3a.secret.key - Your AWS secret key (omit for role authentication) fs.s3a.connection.maximum - Controls how many parallel connections HttpClient spawns (default: 15) fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 (default: true) fs.s3a.attempts.maximum - How many times we should retry commands on transient errors (default: 10) fs.s3a.connection.timeout - Socket connect timeout (default: 5000) fs.s3a.paging.maximum - How many keys to request from S3 when doing directory listings at a time (default: 5000) fs.s3a.multipart.size - How big (in bytes) to split a upload or copy operation up into (default: 104857600) fs.s3a.multipart.threshold - Until a file is this large (in bytes), use non-parallel upload (default: 2147483647) fs.s3a.acl.default - Set a canned ACL on newly created/copied objects (private | public-read | public-read-write | authenticated-read | log-delivery-write | bucket-owner-read | bucket-owner-full-control) fs.s3a.multipart.purge - True if you want to purge existing multipart uploads that may not have been completed/aborted correctly (default: false) fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads to purge (default: 86400) fs.s3a.buffer.dir - Comma separated list of directories that will be used to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a ) *Caveats*: Hadoop uses a standard output committer which uploads files as filename.COPYING before renaming them. This can cause unnecessary performance issues with S3 because it does not have a rename operation and S3 already verifies uploads against an md5 that the driver sets on the upload request. While this FileSystem should be significantly faster than the built-in s3native driver because of parallel copy support, you may want to consider setting a null output committer on our jobs to further improve performance. Because S3 requires the file length and MD5 to be known before a file is uploaded, all output is buffered out to a temporary file first similar to the s3native driver. Due to the lack of native rename() for S3, renaming extremely large files or directories make