[jira] [Updated] (HADOOP-10623) Provide a utility to be able inspect the config as seen by a hadoop client daemon
[ https://issues.apache.org/jira/browse/HADOOP-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HADOOP-10623: --- Attachment: HADOOP-10623.v02.patch Added ability to load the config from - an arbitrary filesystem (helps digesting job.xml from a staging submit dir) - include only a certain key in the Provide a utility to be able inspect the config as seen by a hadoop client daemon -- Key: HADOOP-10623 URL: https://issues.apache.org/jira/browse/HADOOP-10623 Project: Hadoop Common Issue Type: New Feature Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: HADOOP-10623.v01.patch, HADOOP-10623.v02.patch To ease debugging of config issues it is convenient to be able to generate a config as seen by the job client or a hadoop daemon {noformat} ]$ hadoop org.apache.hadoop.util.ConfigTool -help Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ] if resource contains '/', load from local filesystem otherwise, load from the classpath Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {noformat} {noformat} $ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml ./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python -mjson.tool { properties: [ { isFinal: false, key: mapreduce.framework.name, resource: mapred-site.xml, value: yarn }, { isFinal: false, key: mapreduce.client.genericoptionsparser.used, resource: programatically, value: true }, { isFinal: false, key: my.test.conf, resource: from command line, value: val }, { isFinal: false, key: from.file.key, resource: hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml, value: from.file.val }, { isFinal: false, key: mapreduce.shuffle.port, resource: mapred-site.xml, value: ${my.mapreduce.shuffle.port} } ] } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-8572) Have the ability to force the use of the login user
[ https://issues.apache.org/jira/browse/HADOOP-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005721#comment-14005721 ] Steve Loughran commented on HADOOP-8572: I'd rather the complexity in the code than having another config option to turn it off, as that leads to another config option to play with when trying to get security to work. Once you start trying to talk to secure clusters or just run code in YARN app masters (as user yarn) while impersonating the user submitting the job, you'll discover there's already enough to worry about. Getting the developers to care about this sooner rather than later is, while painful, the best way to make sure things run in production Have the ability to force the use of the login user Key: HADOOP-8572 URL: https://issues.apache.org/jira/browse/HADOOP-8572 Project: Hadoop Common Issue Type: Improvement Reporter: Guillaume Nodet Attachments: HADOOP-8572.patch In Karaf, most of the code is run under the karaf user. When a user ssh into Karaf, commands will be executed under that user. Deploying hadoop inside Karaf requires that the authenticated Subject has the required hadoop principals set, which forces the reconfiguration of the whole security layer, even at dev time. My patch proposes the introduction of a new configuration property {{hadoop.security.force.login.user}} which if set to true (it would default to false to keep the current behavior), would force the use of the login user instead of using the authenticated subject (which is what happen when there's no authenticated subject at all). This greatly simplifies the use of hadoop in such environments where security isn't really needed (at dev time). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9706) Provide Hadoop Karaf support
[ https://issues.apache.org/jira/browse/HADOOP-9706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005723#comment-14005723 ] Steve Loughran commented on HADOOP-9706: # we need a patch for the hadoop-tools section # all the versions need to go into the hadoop-project pom.xml. If there are new artifacts, they should still be declared there (including the actual dependency value). this gives us one central place for managing the artifacts, both version and exclusions. Provide Hadoop Karaf support Key: HADOOP-9706 URL: https://issues.apache.org/jira/browse/HADOOP-9706 Project: Hadoop Common Issue Type: Task Components: tools Reporter: Jean-Baptiste Onofré Fix For: 3.0.0 Attachments: HADOOP-9706.patch, Karaf-HDFS-client.pdf To follow the discussion about OSGi, and in order to move forward, I propose the following hadoop-karaf bundle. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-8446) make hadoop-core jar OSGi friendly
[ https://issues.apache.org/jira/browse/HADOOP-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-8446. Resolution: Duplicate make hadoop-core jar OSGi friendly -- Key: HADOOP-8446 URL: https://issues.apache.org/jira/browse/HADOOP-8446 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Freeman Fang hadoop-core isn't OSGi friendly, so for those who wanna use it in OSGi container, must wrap it with tool like bnd/maven-bundle-plugin. Apache Servicemix always wrap 3rd party jars which isn't OSGi friendly, you can see we've done it for lots of jars here[1], more specifically for several hadoop-core versions[2]. Though we may keep this way doing it, the problem is that we need do it for every new released version for 3rd party jars, more importantly we need ensure other Apache projects communities are aware of we're doing it. In Servicemix we just wrap hadoop-core 1.0.3, issues to track it in Servicemix is[3]. We hope Apache Hadoop can offer OSGi friendly jars, in most cases, it's should be straightforward, as it just need add OSGi metadata headers to MANIFEST.MF, this could be done easily with maven-bundle-plugin if build with maven. There's also some other practice should be followed like different modules shouldn't share same package(avoid split pacakge). thanks [1]http://repo2.maven.org/maven2/org/apache/servicemix/bundles [2]http://repo2.maven.org/maven2/org/apache/servicemix/bundles/org.apache.servicemix.bundles.hadoop-core/ [3]https://issues.apache.org/jira/browse/SMX4-1147 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-8571) Improve resource cleaning when shutting down
[ https://issues.apache.org/jira/browse/HADOOP-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005724#comment-14005724 ] Steve Loughran commented on HADOOP-8571: Guillaume, can you look at what it takes to do this for the Hadoop trunk? Any OSGI support will go in there Improve resource cleaning when shutting down Key: HADOOP-8571 URL: https://issues.apache.org/jira/browse/HADOOP-8571 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.0, 2.0.0-alpha, 3.0.0 Reporter: Guillaume Nodet -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs
[ https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10557: --- Attachment: HADOOP-10557.patch Attaching a patch. FsShell -cp -p does not preserve extended ACLs -- Key: HADOOP-10557 URL: https://issues.apache.org/jira/browse/HADOOP-10557 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HADOOP-10557.patch This issue tracks enhancing FsShell cp to * preserve extended ACLs by -p option or * add a new command-line option for preserving extended ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10557) FsShell -cp -p does not preserve extended ACLs
[ https://issues.apache.org/jira/browse/HADOOP-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10557: --- Affects Version/s: 2.4.0 Status: Patch Available (was: Open) FsShell -cp -p does not preserve extended ACLs -- Key: HADOOP-10557 URL: https://issues.apache.org/jira/browse/HADOOP-10557 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HADOOP-10557.patch This issue tracks enhancing FsShell cp to * preserve extended ACLs by -p option or * add a new command-line option for preserving extended ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10602) Documentation has broken Go Back hyperlinks.
[ https://issues.apache.org/jira/browse/HADOOP-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10602: --- Attachment: HADOOP-10602.3.patch Updated the patch to remove 'Go Back' link from XAttr document. Documentation has broken Go Back hyperlinks. -- Key: HADOOP-10602 URL: https://issues.apache.org/jira/browse/HADOOP-10602 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Attachments: HADOOP-10602.2.patch, HADOOP-10602.3.patch, HADOOP-10602.patch Multiple pages of our documentation have Go Back links that are broken, because they point to an incorrect relative path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10626) Limit Returning Attributes for LDAP search
Jason Hubbard created HADOOP-10626: -- Summary: Limit Returning Attributes for LDAP search Key: HADOOP-10626 URL: https://issues.apache.org/jira/browse/HADOOP-10626 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.3.0 Reporter: Jason Hubbard When using Hadoop Ldap Group mappings in an enterprise environment, searching groups and returning all members can take a long time causing a timeout. This causes not all groups to be returned for a user. Because the first search only searches for the user dn and the second search retrieves the group member attribute, we only need to return the group member attribute on the search speeding up the search. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005952#comment-14005952 ] Yi Liu commented on HADOOP-10603: - Charles, thanks for your good comments: {quote} I have also made some edits to CryptoInputStream and CryptoOutputStream. I have attached the whole file for those two rather than diffs. {quote} Thanks for revise javadoc/comments, I have merged them to the new patch. You add {{getWrappedStream}}, I also put into the new patch, it’s for test case? {quote} CryptoFactory.java Perhaps rename this to Crypto. {quote} I change it to {{CryptoCodec}}. {quote} getEncryptor/getDecryptor should also declare throws GeneralSecurityException {quote} OK, I throw it out now, originally I catch it internally. {quote} Encryptor.java encrypt should declare throws GeneralSecurityException {quote} I already wrap it to {{IOException}} {quote} decl for encrypt 80 chars {quote} OK. I will update it. {quote} Consider making this interface an inner class of Crypto (aka CryptoFactory). {quote} The {{Encryptor}}/{{Decryptor}} contains more than one interfaces, they are not suitable as inner class. {quote} Remind me again why encrypt/decrypt don't take a position argument? {quote} Several reasons: * We don’t need do {{Cipher#init}} + {{Cipher#update}} + {{Cipher#doFinal}} for every {{encrypt/decrypt}} operation, that’s expensive. We should reply on the {{Cipher}} maintaining the encryption/decryption context, such as calculating counter/IV, we just need {{Cipher#update}} for CTR, only for bad JCE provider implementation, we need {{Cipher#doFinal}}, and we can handle this situation. And I believe it will never happen, it should be a bug of the cipher provider since it doesn’t follow the definition of {{Cipher#update}}. So we don’t need have a position argument. * The interface is a common interface, we should think other encryption mode may be used by other features in future, a position argument doesn’t make sense to other mode. {quote} I wonder if, in general, we'll also want byte[] overloadings of the methods (as well as BB) for encrypt()/decrypt(). {quote} We can have this, if you prefer or some other guy prefer, let’s add it. {quote} The decl for decrypt 80 chars {quote} Right, I will updated it. {quote} JCEAESCTRCryptoFactory.java This file needs an apache license header Perhaps rename it to JCEAESCTRCrypto.java getDescryptor/getEncryptor should throw GeneralSecurityException {quote} Right, I will update it. And rename to {{JCEAESCTRCryptoCodec}} {quote} JCEAESCTRDecryptor.java ctor should throw GeneralSecurityException instead of RTException decrypt should throw GeneralSecurityException JCEAESCTREncryptor.java ctor should throw GeneralSecurityException instead of RTException encrypt should throw GeneralSecurityException {quote} I will update constructor to throw {{GeneralSecurityException}}, but for {{decrypt/encrypt}} I have wrapped it to {{IOException}}. {quote} put a newline after public class CryptoUtils { Could calIV be renamed to calcIV? {quote} calIV has been refined and renamed to {{calculateIV}}, for different CryptoCodec, we can have different implementation. {quote} CryptoFSDataOutputStream.java Why is fsOut needed? Why can't you just reference out for (e.g.) getPos()? {quote} Since {{out}} is instanceOf {{CryptoOutputStream}}, doesn’t have {{getPos()}} {quote} CryptoInputStream.java You'll need a getWrappedStream() method. {quote} Yes, I add it, but I’m not quite clear the purpose? {quote} Why 8192? Should this be moved to a static final int CONSTANT? {quote} It’s a configuration now. {quote} IWBNI the name of the interface that a particular method is implementing were put in a comment before the @Override. For instance, // PositionedRead @Override public int read(long position ...) {quote} OK, I will update it. {quote} In read(byte[], int, int), isn't the if (!usingByteBufferRead) I am worried that throwing and catching UnsupportedOperationException will be expensive. It seems very likely that for any particular stream, the same byte buffer will be passed in for the life of the stream. That means that for every call to read(...) there is potential for the UnsupportedOperationException to be thrown. That will be expensive. Perhaps keep a piece of state in the stream that gets set on the first time through indicating whether the BB is readable or not. Or keep a reference to the BB along with a bool. If the reference changes (on the off chance that the caller switched BBs for the same stream), then you can redetermine whether read is supported or not. {quote} Actually we have {{in instanceof ByteBufferReadable}}, not for any stream, so it’s not expensive. if a stream implements {{ByteBufferReadable}}, why we need {{UnsupportedOperationExation}}? Since it could also throw {{UnsupportedOperationException}} too, it may be a wrapper
[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HADOOP-10603: Attachment: HADOOP-10603.9.patch Hi [~andrew.wang], I update the patch and merge test cases from HADOOP-10617. The test cases cover all functionality of crypto streams. The new patch address all your and Charles’ comments, excluding following items (Not get enough time :-)), if I missed some, please correct me. * Test for {{calculateIV}} * Test for {{so I'd prefer to see a Precondition check and inBuf.remaining() == padding). Test case would be nice if I'm right about this.}} * An ASCII art diagram showing how padding and the stream offset works would also be nice. Javadoc for the special padding handling would be nice. * {quote} We need to return -1 on EOF for zero-byte reads, see HDFS-5762. {quote} I see I handle this already? If the underlying stream return -1, we will return -1. I will add test case for this. * Comment in skip about why we subtract then add outBuffer.remaining() would be good. Please help to review. The new patch also includes some interfaces/implementation refining. Meanwhile, I will update for above items in next patch together with the new comments if you have. Thanks. Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher Input/Output streams directly because we need to support the additional interfaces that the Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor, CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-8571) Improve resource cleaning when shutting down
[ https://issues.apache.org/jira/browse/HADOOP-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006014#comment-14006014 ] Guillaume Nodet commented on HADOOP-8571: - I don't really have much time to work on that atm. This specific jira issue is not OSGi specific, it's just about being able to embed hadoop, which requires that the shutdown does not leak threads or other resources. Improve resource cleaning when shutting down Key: HADOOP-8571 URL: https://issues.apache.org/jira/browse/HADOOP-8571 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.0, 2.0.0-alpha, 3.0.0 Reporter: Guillaume Nodet -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10420) Add support to Swift-FS to support tempAuth
[ https://issues.apache.org/jira/browse/HADOOP-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006164#comment-14006164 ] Gil Vernik commented on HADOOP-10420: - It certainly not related SoftLayer. Swift uses tempauth by default and every Swift installation contains tempauth authentication module. What is the status of this patch? Someone already working on it? Add support to Swift-FS to support tempAuth --- Key: HADOOP-10420 URL: https://issues.apache.org/jira/browse/HADOOP-10420 Project: Hadoop Common Issue Type: Improvement Components: fs, tools Affects Versions: 2.3.0 Reporter: Jinghui Wang Attachments: HADOOP-10420.patch Currently, hadoop-openstack Swift FS supports keystone authentication. The attached patch adds support for tempAuth. Users will be able to configure which authentication to use. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HADOOP-10627) Add documentation for http/https policy configuration/setup
[ https://issues.apache.org/jira/browse/HADOOP-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao moved HDFS-6444 to HADOOP-10627: -- Component/s: (was: documentation) documentation Key: HADOOP-10627 (was: HDFS-6444) Project: Hadoop Common (was: Hadoop HDFS) Add documentation for http/https policy configuration/setup --- Key: HADOOP-10627 URL: https://issues.apache.org/jira/browse/HADOOP-10627 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Jing Zhao HADOOP-10022/HDFS-5305 etc. adds new http/https policy support in HDFS and Yarn. We should have documentation for its configuration and setup. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10587) Use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects
[ https://issues.apache.org/jira/browse/HADOOP-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006259#comment-14006259 ] Andrew Wang commented on HADOOP-10587: -- Hey Colin, we had a quick offline discussion about whether this is valid; I'm not sure whether this change will reduce peak memory consumption since all the temp bufs should be GC-able. What's your take? Should we resolve? Use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects Key: HADOOP-10587 URL: https://issues.apache.org/jira/browse/HADOOP-10587 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-10587.001.patch We can use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects. This will reduce our memory usage (for example, when loading edit logs), and help prevent OOMs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006297#comment-14006297 ] Allen Wittenauer commented on HADOOP-9902: -- So, i found the only place where hadoop.id.str is still getting used (other than setting it): {code} ./bigtop-packages/src/common/hadoop/conf.secure/log4j.properties:log4j.appender.DRFAS.File=/var/local/hadoop/logs/${hadoop.id.str}/${hadoop.id.str}-auth.log {code} On the surface, this looks like a pretty good use case. So I suppose this property lives for another day. But I'm going to nuke yarn.id.str from the face of the earth since nothing references it. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10587) Use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects
[ https://issues.apache.org/jira/browse/HADOOP-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-10587: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) I think we should hold off on this, since it should not be a problem in practice. The out-of-memory case that prompted this turned out to be a case where the heap was entirely filled with tokens, and the DataOutputBuffer objects were a red herring. Use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects Key: HADOOP-10587 URL: https://issues.apache.org/jira/browse/HADOOP-10587 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-10587.001.patch We can use a thread-local cache in TokenIdentifier#getBytes to avoid creating many DataOutputBuffer objects. This will reduce our memory usage (for example, when loading edit logs), and help prevent OOMs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006316#comment-14006316 ] Mark Grover commented on HADOOP-9902: - Sounds good to me! Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10618) Remove SingleNodeSetup.apt.vm
[ https://issues.apache.org/jira/browse/HADOOP-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006417#comment-14006417 ] Arpit Agarwal commented on HADOOP-10618: +1 for the patch. I will commit it shortly. Remove SingleNodeSetup.apt.vm - Key: HADOOP-10618 URL: https://issues.apache.org/jira/browse/HADOOP-10618 Project: Hadoop Common Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: HADOOP-10618.2.patch, HADOOP-10618.patch http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html is deprecated and not linked from the left side page. We should remove the document and use http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10618) Remove SingleNodeSetup.apt.vm
[ https://issues.apache.org/jira/browse/HADOOP-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-10618: --- Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk and branch-2. Thanks for the contribution [~ajisakaa]. Remove SingleNodeSetup.apt.vm - Key: HADOOP-10618 URL: https://issues.apache.org/jira/browse/HADOOP-10618 Project: Hadoop Common Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.5.0 Attachments: HADOOP-10618.2.patch, HADOOP-10618.patch http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html is deprecated and not linked from the left side page. We should remove the document and use http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-6356) Add a Cache for AbstractFileSystem in the new FileContext/AbstractFileSystem framework.
[ https://issues.apache.org/jira/browse/HADOOP-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006445#comment-14006445 ] Sumit Kumar commented on HADOOP-6356: - @all - trying to bring your attention on this JIRA? I see that parts of Hive/Hadoop code have already started consuming these apis but looking at this JIRA, there hasn't been much interest since last 2 years Add a Cache for AbstractFileSystem in the new FileContext/AbstractFileSystem framework. --- Key: HADOOP-6356 URL: https://issues.apache.org/jira/browse/HADOOP-6356 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Sanjay Radia Assignee: Sanjay Radia The new filesystem framework, FileContext and AbstractFileSystem does not implement a cache for AbstractFileSystem. This Jira proposes to add a cache to the new framework just like with the old FileSystem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-6356) Add a Cache for AbstractFileSystem in the new FileContext/AbstractFileSystem framework.
[ https://issues.apache.org/jira/browse/HADOOP-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006535#comment-14006535 ] Colin Patrick McCabe commented on HADOOP-6356: -- I think the performance situation may not be as bad as you think. HDFS caches things like sockets and short-circuit file descriptors in a separate, global cache that is not per-{{FileContext}}. They are cached in {{org.apache.hadoop.hdfs.ClientContext}} and will be shared between multiple different {{FileContext}} objects. The big problems with caching {{FileContext}} objects are: * they're mutable, so you don't know when another thread will call {{FileContext#setWorkingDirectory}} and screw up how your thread is resolving relative paths * UGI has to be a part of the lookup key for any cache, and we have refused to compare UGIs except by object equality in the past. This can lead to (for some) counter-intuitive cache behavior. Given these limitations, I think it's better to have the applications do their own caching. Maybe we could provide a utility class that would help with this, but we should stay away from adding more global variables. Add a Cache for AbstractFileSystem in the new FileContext/AbstractFileSystem framework. --- Key: HADOOP-6356 URL: https://issues.apache.org/jira/browse/HADOOP-6356 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Sanjay Radia Assignee: Sanjay Radia The new filesystem framework, FileContext and AbstractFileSystem does not implement a cache for AbstractFileSystem. This Jira proposes to add a cache to the new framework just like with the old FileSystem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006624#comment-14006624 ] Charles Lamb commented on HADOOP-10603: --- Hi Yi, Good work so far. I took your latest patch and incorporated it into my sandbox and got my unit tests running with it. I have also made some edits to CryptoInputStream and CryptoOutputStream. I have attached the whole file for those two rather than diffs. CryptoFactory.java Perhaps rename this to Crypto. getEncryptor/getDecryptor should also declare throws GeneralSecurityException Encryptor.java encrypt should declare throws GeneralSecurityException decl for encrypt 80 chars Consider making this interface an inner class of Crypto (aka CryptoFactory). Remind me again why encrypt/decrypt don't take a position argument? I wonder if, in general, we'll also want byte[] overloadings of the methods (as well as BB) for encrypt()/decrypt(). Decryptor.java decrypt should throw GeneralSecurityException The decl for decrypt 80 chars Consider making this interface a subclass of Crypto (aka CryptoFactory). JCEAESCTRCryptoFactory.java This file needs an apache license header Perhaps rename it to JCEAESCTRCrypto.java getDescryptor/getEncryptor should throw GeneralSecurityException JCEAESCTRDecryptor.java ctor should throw GeneralSecurityException instead of RTException decrypt should throw GeneralSecurityException JCEAESCTREncryptor.java ctor should throw GeneralSecurityException instead of RTException encrypt should throw GeneralSecurityException CryptoUtils.java put a newline after public class CryptoUtils { Could calIV be renamed to calcIV? CryptoFSDataOutputStream.java Why is fsOut needed? Why can't you just reference out for (e.g.) getPos()? CryptoInputStream.java You'll need a getWrappedStream() method. Why 8192? Should this be moved to a static final int CONSTANT? IWBNI the name of the interface that a particular method is implementing were put in a comment before the @Override. For instance, // PositionedRead @Override public int read(long position ...) IWBNI all of the methods for a particular interface were grouped together in the code. In read(byte[], int, int), isn't the if (!usingByteBufferRead) I am worried that throwing and catching UnsupportedOperationException will be expensive. It seems very likely that for any particular stream, the same byte buffer will be passed in for the life of the stream. That means that for every call to read(...) there is potential for the UnsupportedOperationException to be thrown. That will be expensive. Perhaps keep a piece of state in the stream that gets set on the first time through indicating whether the BB is readable or not. Or keep a reference to the BB along with a bool. If the reference changes (on the off chance that the caller switched BBs for the same stream), then you can redetermine whether read is supported or not. In readFully, you could simplify the implementation by just calling into read(long, byte[]...), like this: @Override // PositionedReadable public void readFully(long position, byte[] buffer, int offset, int length) throws IOException { int nread = 0; while (nread length) { int nbytes = read(position + nread, buffer, offset + nread, length - nread); if (nbytes 0) { throw new EOFException(End of file reached before reading fully.); } nread += nbytes; } } That way you can let read(long...) do all the unwinding of the seek position. In seek(), you can do a check for forward == 0 and return immediately, thus saving the two calls to position() in the noop case. Ditto skip(). I noticed that you implemented read(ByteBufferPool), but not releaseBuffer(BB). Is that because you didn't have time (it's ok if that's the case, I'm just wondering why one and not the other)? CryptoOutputStream.java You'll need a getWrappedStream() method. Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006628#comment-14006628 ] Charles Lamb commented on HADOOP-10603: --- CryptoInputStream.java: Shouldn't usingByteBufferRead be a class variable so that we don't keep checking in instanceof ByteBufferReadable everytime we call read()? Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher Input/Output streams directly because we need to support the additional interfaces that the Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor, CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006655#comment-14006655 ] Andrew Wang commented on HADOOP-10603: -- Thanks for the mega-rev Yi, I went through and ticked off my previous review comments. I think we're pretty close if Charles agrees, just had a few things besides the last few you already identified. - New configuration keys should go in CommonConfigurationKeysPublic, with a provided default also. - Any reason you put the buffer size in CryptoCodec rather than in the Crypto streams? The streams seem to make more sense. - Could also do some basic Precondition validation on the config parameters. - Should CryptoCodec do {{setConf(new Configuration())}} in its constructor? - Streams still have some hardcoded {{16}} - (off+len) can still int overflow, need to do some casting to longs to be safe, or some tricks to avoid addition - updateDecryptor still doesn't need that parameter - Still some tabs present (I think your IDE inserts them when splitting a string) Test: * getDataLen() is never used * Let's add conservative test timeouts (e.g. 12) * I think you can use the @Ignore annotation to skip unsupported LocalFS tests. Can provide a reason too. Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher Input/Output streams directly because we need to support the additional interfaces that the Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor, CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006678#comment-14006678 ] Yi Liu commented on HADOOP-10603: - Thanks Charles. {quote} Shouldn't usingByteBufferRead be a class variable so that we don't keep checking in instanceof ByteBufferReadable everytime we call read()? {quote} Right, I will update this. Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher Input/Output streams directly because we need to support the additional interfaces that the Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor, CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
[ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006680#comment-14006680 ] Yi Liu commented on HADOOP-10603: - Thanks [~andrew.wang] for your nice comments. I will update the patch for your new comments together with the left few items, and will respond you later. Crypto input and output streams implementing Hadoop stream interfaces - Key: HADOOP-10603 URL: https://issues.apache.org/jira/browse/HADOOP-10603 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: CryptoInputStream.java, CryptoOutputStream.java, HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, HADOOP-10603.9.patch, HADOOP-10603.patch A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher Input/Output streams directly because we need to support the additional interfaces that the Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor, CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10562) Namenode exits on exception without printing stack trace in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HADOOP-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006791#comment-14006791 ] Hudson commented on HADOOP-10562: - FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) HADOOP-10562. Fix CHANGES.txt entry again (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596386) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt HADOOP-10562. Fix CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596378) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt Namenode exits on exception without printing stack trace in AbstractDelegationTokenSecretManager Key: HADOOP-10562 URL: https://issues.apache.org/jira/browse/HADOOP-10562 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.2.1, 2.4.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Critical Fix For: 3.0.0, 1.3.0, 2.4.1 Attachments: HADOOP-10562.1.patch, HADOOP-10562.branch-1.1.patch, HADOOP-10562.patch Not printing the stack trace makes debugging harder. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10618) Remove SingleNodeSetup.apt.vm
[ https://issues.apache.org/jira/browse/HADOOP-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006806#comment-14006806 ] Hudson commented on HADOOP-10618: - FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) HADOOP-10618. Remove SingleNodeSetup.apt.vm (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596964) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm Remove SingleNodeSetup.apt.vm - Key: HADOOP-10618 URL: https://issues.apache.org/jira/browse/HADOOP-10618 Project: Hadoop Common Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Fix For: 3.0.0, 2.5.0 Attachments: HADOOP-10618.2.patch, HADOOP-10618.patch http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html is deprecated and not linked from the left side page. We should remove the document and use http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html instead. -- This message was sent by Atlassian JIRA (v6.2#6252)