[jira] [Created] (MAPREDUCE-6101) on job submission, if input or output directories are encrypted, shuffle data should be encrypted at rest
Alejandro Abdelnur created MAPREDUCE-6101: - Summary: on job submission, if input or output directories are encrypted, shuffle data should be encrypted at rest Key: MAPREDUCE-6101 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6101 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.6.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Currently setting shuffle data at rest encryption has to be done explicitly to work. If not set explicitly (ON or OFF) but the input or output HDFS directories of the job are in an encrption zone, we should set it to ON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6060) shuffle data should be encrypted at rest if the input/output of the job are in an encryption zone
Alejandro Abdelnur created MAPREDUCE-6060: - Summary: shuffle data should be encrypted at rest if the input/output of the job are in an encryption zone Key: MAPREDUCE-6060 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6060 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur If the input or output of an MR job are within an encryption zone, by default the intermediate data of the job should be encrypted. Setting the {{MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA}} property explicitly should override the default behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-5890. --- Resolution: Fixed Fix Version/s: fs-encryption Hadoop Flags: Reviewed I've just committed this JIRA to fs-encryption branch. [~chris.douglas], thanks for all the review cycles you spent on this. [~asuresh], thanks for persevering until done, nice job. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Fix For: fs-encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.14.patch, MAPREDUCE-5890.15.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053828#comment-14053828 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- [~chris.douglas], thanks for the detailed feedback/review iterations on this. Does this means you are OK with committing the current patch? Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048355#comment-14048355 ] Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 7/1/14 4:50 AM: --- [~chris.douglas], I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is : * The {{IFile}} currently does not have a notion of an explicit header/metadata. * While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ). * Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStreams}} directly without an associated {{IFile.Reader}} was (Author: asuresh): [~chris.douglas], I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is : * The {{IFile}} currently does not have a notion of an explicit header/metadata. * While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ). * Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStream}}s directly without an associated {{IFile.Reader}} Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046863#comment-14046863 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- LGTM. One minor nit (i can take care of it when committing), in {{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed change. [~chris.douglas], I believe all our suggestions/concerns have been addressed. Do you want to do a new pass on the patch? I'll wait a few days to commit. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046863#comment-14046863 ] Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 6/29/14 2:13 AM: LGTM. One minor nit (i can take care of it when committing), in {{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed change. [~chris.douglas], I believe all your suggestions/concerns have been addressed. Do you want to do a new pass on the patch? I'll wait a few days to commit. was (Author: tucu00): LGTM. One minor nit (i can take care of it when committing), in {{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed change. [~chris.douglas], I believe all our suggestions/concerns have been addressed. Do you want to do a new pass on the patch? I'll wait a few days to commit. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044665#comment-14044665 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- On the performance hit, if encryption is OFF I would say it is NILL (the only extra thing being don is resolving a boolean config to check if encryption is ON or OFF). if encryption is ON, you are hitting the encryption/decryption overhead. Doing prelimiaries encrytion benchmarks with the crypto streams using Diceros (CryptoCodec-JCE-JNI-OpenSSL) I've got 1000MB/sec both on encrypt/decrypt on my laptop. Once we have HADOOP-10693 and this JIRA, will be able to do some end to end benchmarks. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043884#comment-14043884 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- Fetcher.java MapTask.java MergerManagerImpl.java Merger.java ShuffleHandler.java ShuffleHeader.java * several space changes (configure your editor not to trim unmodified lines CryptoUtils.java * createIV(): javadocs, invalid params * wrap() OUT/IN methods: any change to consolidate all/most signatures to delegate to a single one doing the repetitive logic? * a couple wrap() methods have a funny LOG message * wrap() OUT methods use cc.AlgorithmBlockSize(), but wrap() IN methods use 16, for IN methods you can use the cc already avail in the method. * wrap() methods wrap if necessary (the IF ENCRYTPED has been moved inside), the name should reflect that, maybe something like 'wrapIfNecessary()' Fetcher.java * copyMapOutput() is unconditionally correct the offset, this seems wrong. * No need to define out2, just reuse out Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043900#comment-14043900 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- bq. If this really is a requirement, aren't we better off asking cluster admins to either install disks with local file-systems that support encryption specifically for intermediate data or just create some partitions that support encryption? That seems like the right layer to handle something like this instead of adding a whole lot of complexity into the software that only has a downside of performance. Asking to install additional soft to encrypt local FS means installing Kernel modules. Also, this would mean that ALL MR jobs are going to pay the penalty of encrypted intermediate data. That is not reasonable. I don't agree on the statement that this is adding a lot of complexity, it is simply wrapping the streams where necessary. bq. Wearing my YARN hat, it is not enough to do this just for MapReduce. Every other framework running on YARN will need to add this complexity - this is asking for too much complexity. We are better off handling it at the file-system/partition/disk level. This patch is not touching anything in Yarn, but in MapReduce, private/evolving classes of it. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042373#comment-14042373 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- Hi [~chris.douglas], I would prefer to keep the current MR job test because it test spills/merges on both sides of the MR job making sure no edge cases are not covered. The {{ShuffleHandler}} is a private class of MapReduce, if other frameworks use it, it is at their own risk. Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce (this happened in MAPREDUCE-2454). Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042768#comment-14042768 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- [~chris.douglas], on the last section of the previous comment. I didn't mean to say your refactoring asks are a wild goose, I just wanted to say I don't want to end up on that situation. My apologies if I've given the wrong impression with my comment. I've talked with Arun and he is already exploring along the lines of your suggestions to see their feasibility. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039671#comment-14039671 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- LGTM. [~asuresh], can you run test-patch locally on the patch and paste the result in the JIRA? After that, I think we are good to go. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036684#comment-14036684 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- Suresh, any special reason why the test is not included in the main patch? I’m not quite happy with the IF blocks scattered around: {code} if (CryptoUtils.isShuffleEncrypted(conf)) { byte[] iv = CryptoUtils.createIVFile(conf, fs, file); out = CryptoUtils.wrap(conf, iv, out); } {code} Given that current abstraction does not provide a clean cut to hide this within the {{IFile}} without a significant refactoring throughout the code, I think is the least evil. Nice job. Could you try running test-patch locally on the fs-encryption branch with this patch? Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.2.patch, MAPREDUCE-5890.test.patch For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-5890: - Assignee: Arun Suresh (was: Alejandro Abdelnur) Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-2608) Mavenize mapreduce contribs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-2608. --- Resolution: Invalid [doing self-clean up of JIRAs] closing as invalid as this has been done in different jiras. Mavenize mapreduce contribs --- Key: MAPREDUCE-2608 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2608 Project: Hadoop Map/Reduce Issue Type: Task Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Same as HADOOP-6671 for mapreduce contribs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
Alejandro Abdelnur created MAPREDUCE-5890: - Summary: Support for encrypting Intermediate data and spills in local filesystem Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-4658) Move tools JARs into separate lib directories and have common bootstrap script.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-4658. --- Resolution: Won't Fix [doing self-clean up of JIRAs] scripts have change significantly since this JIRA. Move tools JARs into separate lib directories and have common bootstrap script. --- Key: MAPREDUCE-4658 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4658 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur This is a follow up of the discussion going on on MAPREDUCE-4644 -- Moving each tools JARs into separate lib/ dirs it is quite easy (modifying a single assembly). What we should think is a common bootstrap script for that so each tool does not have to duplicate (and get wrong) such script. I'll open a JIRA for that. -- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998426#comment-13998426 ] Alejandro Abdelnur commented on MAPREDUCE-5890: --- HADOOP-10603 introduces crypto streams to be used by for filesystem encryption. We could leverage it for encrypting map output data, the Reducer shuffle would decrypt it (no need for network encryption as data would be encrypted in transit). The reducer, when writing spills to disk woudl encrypt and it would decrypt while reading the spills. It may make sense to do this JIRA as part of fs-encryption branch. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901887#comment-13901887 ] Alejandro Abdelnur commented on MAPREDUCE-5641: --- [~rkanter], [~jlowe], how about not touching the current permissions of stating and making the RM a proxy user in HDFS. Then the files would be written as the user. [~vinodkv], I'm a bit reluctant to get the JHS to depend on the AHS at this point as the AHS is not fully cooked. I would prefer dropping the JHS alltogether in favor of the AHS when the AHS is ready for prime time with AM extensions. History for failed Application Masters should be made available to the Job History Server - Key: MAPREDUCE-5641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, jobhistoryserver Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-5641.patch Currently, the JHS has no information about jobs whose AMs have failed. This is because the History is written by the AM to the intermediate folder just before finishing, so when it fails for any reason, this information isn't copied there. However, it is not lost as its in the AM's staging directory. To make the History available in the JHS, all we need to do is have another mechanism to move the History from the staging directory to the intermediate directory. The AM also writes a Summary file before exiting normally, which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901985#comment-13901985 ] Alejandro Abdelnur commented on MAPREDUCE-5641: --- yep, I've meant that. The JHS is trusted code, no user code running there. The doAs with the proxy user would be used only for this case. Also, all this would go away when the AHS is ready to take over. History for failed Application Masters should be made available to the Job History Server - Key: MAPREDUCE-5641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, jobhistoryserver Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-5641.patch Currently, the JHS has no information about jobs whose AMs have failed. This is because the History is written by the AM to the intermediate folder just before finishing, so when it fails for any reason, this information isn't copied there. However, it is not lost as its in the AM's staging directory. To make the History available in the JHS, all we need to do is have another mechanism to move the History from the staging directory to the intermediate directory. The AM also writes a Summary file before exiting normally, which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882112#comment-13882112 ] Alejandro Abdelnur commented on MAPREDUCE-5362: --- patch applies cleanly on trunk's HEAD and builds correctly. Don't know what problem Jenkins is having clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-5362: - Assignee: Alejandro Abdelnur (was: Roman Shaposhnik) [~rvs], I'm stealing this from you. I have a few avail cycles and I want to nail this one clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5362: -- Attachment: MAPREDUCE-5362.patch clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5362: -- Target Version/s: (was: ) Status: Patch Available (was: Open) clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881653#comment-13881653 ] Alejandro Abdelnur commented on MAPREDUCE-5362: --- [~rvs], [~vinodkv], [~ste...@apache.org], [~kkambatl], this patch is the equivalent of YARN-888 for MR. Mind taking it for a spin? It also does a few fixes on things that were not 100% correct: * produces test jars for all MR modules * puts all test jars in the MR test dir * puts all source jars in the MR sources dir * lib has all the direct 3rd party dependencies of MR clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5724: -- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to trunk and branch-2. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.4.0 Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at
[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874157#comment-13874157 ] Alejandro Abdelnur commented on MAPREDUCE-3310: --- If Tez is reusing all Hadoop MR task impl stuff the answer to it would yes, otherwise the new method would not be used at all and it doesn't matter what it returns. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Fix For: 1.3.0, 2.4.0 Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874169#comment-13874169 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- From what I understand, For HBase, or HDFS HA Yarn HA, it is the corresponding client library the one that resolves the real host, so this would be taken care by the use of it (of the client library, hbase, hdfs, yarn) from within the {{CredentialsProvider}} implementation for that service. I think an {{URI[]}} (all of the same scheme being passed to a the corresponding {{CredentialsProvider}} impl should be enough, no? Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872404#comment-13872404 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- planning to comment later this morning. sorry yesterday got caught on diff things. Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872542#comment-13872542 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- bq. ... I’m not too sure about - mainly from the perspective of services not handling getToken requests correctly if security is disabled We are moving away from this, in Yarn we always use tokens, regardless of the security configuration. Oozie needs tokens to be there in order to work correctly. bq. ... The JobClient currently doesn't do this, at least for HDFS. Actually, yes it does do this if you set the {{MRJobConfig.JOB_NAMENODES}} property, this is done in the {{JobSubmitter#populateTokenCache()}} method which is called by {{JobSubmitter#submitJobInternal()}} which is called by {{JobSubmitter#submit()}}. All this is done in the main execution path, thus always done when doing a submit. It is independent of split computations. bq. ... For HBase / HCatalog sources which are outside of the IF/OF for a MR job - I don't think we have the capability for fetching tokens, and rely on the user providing them up front. Actually, we are fetching them upfront only because this was needed for MR jobs, but MR shouldn’t be a special case. Oozie has the concept of {{CredentialsProvider}} for this very same reason. And I think with this JIRA we can fix this in a general case. bq. ... Would this utility class know how to handle all kinds of URIs ? Yes, based on registered handlers for different schemes, more on this follows. My thinking on how to address this is to use the same pattern we are doing today for loading/registering {{FileSystem}}, {{CompressionCodec}}, {{TokenRenewers}}, {{SecurityInfo}} implementations. Using JDK’s {{ServiceLoader}} mechanism to load all available implementations of the following interface: {code} /** * Implementations must be thread-safe. */ public interface CredentialsProvider { /** * Reports the scheme being supported by this provider. */ public String getScheme(); /** * Obtains delegations tokens for the provided URIs. * * @param conf configuration used to initialize the components that connect to the specified URIs. * @param uris URIs of services to obtain delegation tokens from. * @ param targetCredentials credentials to add the fetched delegation tokens. */ public void obtainCredentials(Configuration conf, URI[] uris, Credentials targetCredentials) throws IOException; {code} Then we would have a {{CredentialsProvider}} class that would use a {{ServiceLoader}} to load all credentials available in the classpatch (via the ServiceLoader mechanism, the nice thing about this is that you drop a JAR file with a service implementation and you don’t have to configure anything, it just works provided you have the META-INF/services/... file for it). This would be done in a class static block initialization. the {{CredentialsProvider}} would have a static method {{fetchCredentials(Configuration, URI[], Credentials)}} which sorts out the URIs by scheme and then invokes the corresponding {{CredentialsProvider}} impl for it. Then the different Yarn applications define a property in the conf to indicate the URIs of the services to get tokens and their client submission code does it (like the {{JobSubmitter}} does with {{MRJobConfig.JOB_NAMENODES}} but in a general way. Frameworks may chose to be smarter (in the case of MR get the URIS from the splits an the output dir and get the tokens automatically). Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-5724: - Assignee: Alejandro Abdelnur JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at
[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5724: -- Status: Patch Available (was: Open) JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at
[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5724: -- Attachment: MAPREDUCE-5724.patch trying to do something like YARN-24 for JHS is a bit more complicated. Instead, I've taken a different approach: On startup the JHS will try creating the history directories, if it cannot because the the FS is not available or in safemode will retry for up to 2mins, if it times out, it will then shutdown. So, instead failing immediately, the JHS will wait for the FS to become avail for a while. I've hardcoded the 2mins timeout as I don't think we need to introduce a config value for this. If others feel otherwise, I can update the patch with a config prop for it. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5724: -- Attachment: MAPREDUCE-5724.patch thanks for the reviews. New patch addressing Karthik's and Sandy's comments. Regarding removing the {{throw Exception}} from the {{createHistoryDirs()}}, not possible because the {{tryCreateHistoryDirs}} does throw a checked exception if the reason is other than the FS not being avail. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at
[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5724: -- Attachment: MAPREDUCE-5724.patch Thanks Sandy, new patch changing the exception being thrown to IOException. Regarding detecting the SafeModeException by cause, I've tried that at first, the problem is that the cause is NULL. I've checked with ATM and he indicated that the initCause() method should be called, but according to the javadocs, the initCause() should be called where the exception is being created, so this seems to be an HDFS issue, thus the only way I figured out how to determine if the original exception was due to the filesystem being in safemode was by searching the toString() value for 'SafeModeException'. I'll open a JIRA against HDFS to call initCause() where the exception is being thrown. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873023#comment-13873023 ] Alejandro Abdelnur commented on MAPREDUCE-5724: --- created HDFS-5787 JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102) at
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870835#comment-13870835 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- The Oozie server is responsible for obtaining all the tokens the main job may need: * tokens to run the job (working dir, jobtokens) * tokens for the Input and Output data (typically HDFS tokens, but they can be for different file systems, for Hbase, for HCatalog, etc). For the typical case of running an MR job (directly or via Pig/Hive), the tokens of launcher job are sufficient for the main job. They just need to be propagated. The Oozie server makes sure the mapreduce.job.complete.cancel.delegation.tokens property is set to FALSE for the launcher job (Oozie gets rid of the launcher job for MR jobs once the main job is running). For scenarios where the main job needs to interact with different services, Oozie must acquire them in advance. For HDFS this is done by simply setting the MRJobConfig.JOB_NAMENODES property, then the launcher job submission will get those tokens. For Hbase or HCatalog, Oozie has a CredentialsProvider that obtains those tokens (the requirement here is that Oozie is configured as proxy user in those services in order to get tokens for the user submitting the job). From what it seems you are after generalizing this. If think we should do it with a slightly twist from what you are proposing: * DelegationTokens should be always requested by the client, security enabled or not, computing the splits on the client or not. * DelegationTokens fetching should be done regardless of the IF/OF implementation (take the case of talking with Hbase or HCatalog, job working dir service). * DelegationTokens fetching should not be tied to split computation. We could have a utility class that we pass a UGI, list of service URIs and returns a populated Credentials with tokens for all the specified services. The IF/OF/Job would have to be able to extract the required URIs for the job. Also, this mechanism could be used to obtain ALL tokens the AM needs. Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
Alejandro Abdelnur created MAPREDUCE-5724: - Summary: JobHistoryServer does not start if HDFS is not running Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Priority: Critical Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102) at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561) at
[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871450#comment-13871450 ] Alejandro Abdelnur commented on MAPREDUCE-5724: --- YARN-24 fixed a similar issue for the NM, we should try doing something similar here. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Priority: Critical Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102) at
[jira] [Created] (MAPREDUCE-5722) client-app module failing to compile, missing jersey dependency
Alejandro Abdelnur created MAPREDUCE-5722: - Summary: client-app module failing to compile, missing jersey dependency Key: MAPREDUCE-5722 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5722 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.4.0 This seems a fallout of YARN-888, oddly enough it did not happen while doing a full build with the patch before committing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAPREDUCE-5722) client-app module failing to compile, missing jersey dependency
[ https://issues.apache.org/jira/browse/MAPREDUCE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-5722. --- Resolution: Invalid false alarm, it seems I was picking up some stale POMs from my local cache, doing a full clean build when OK. client-app module failing to compile, missing jersey dependency --- Key: MAPREDUCE-5722 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5722 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.4.0 This seems a fallout of YARN-888, oddly enough it did not happen while doing a full build with the patch before committing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869968#comment-13869968 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- [~sseth], [~acmurthy], why this is needed, if the AM has the corresponding delegation tokens, things work just fine, Oozie has been doing this for years; the splits are computed in the launcher job which does not have kerberos credentials. Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869968#comment-13869968 ] Alejandro Abdelnur edited comment on MAPREDUCE-5663 at 1/13/14 8:59 PM: [~sseth], [~acmurthy], why is this needed? if the AM has the corresponding delegation tokens, things work just fine, Oozie has been doing this for years; the splits are computed in the launcher job which does not have kerberos credentials. was (Author: tucu00): [~sseth], [~acmurthy], why this is needed, if the AM has the corresponding delegation tokens, things work just fine, Oozie has been doing this for years; the splits are computed in the launcher job which does not have kerberos credentials. Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870200#comment-13870200 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- [~sseth], for MR this is fully cooked. It works something like this: * On the AM client side you collect all the tokens you need and write them to HDFS using the Credentials.writeTokenStorageFile() method to HDFS. * the HADOOP_TOKEN_FILE_LOCATION env variable pointing to such file is set to the AM environment. * Then when calling UGI.getLoginUser() on the AM, the UGI credentials should be populated with the contents of the token file writen by the AM client. Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870274#comment-13870274 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- This is done in the MR {{JobSubmitter.java}}, in the {{submitJobInternal(...)}} method: {code} // get delegation token for the dir TokenCache.obtainTokensForNamenodes(job.getCredentials(), new Path[] { submitJobDir }, conf); populateTokenCache(conf, job.getCredentials()); {code} Is this what you are after? Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
[ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870375#comment-13870375 ] Alejandro Abdelnur commented on MAPREDUCE-5663: --- This works out of the box for MR jobs because typically the same FileSystem where the IN/OUT data resides is he one used for the submission dir. If you need to use different FileSystems (i.e. distcp), this is achieved setting the {{MRJobConfig.JOB_NAMENODES}} property in the job confguration, this is handled in the {{JobSubmitter.java}} in the following code: {code} //get secret keys and tokens and store them into TokenCache private void populateTokenCache(Configuration conf, Credentials credentials) throws IOException{ readTokensFromFiles(conf, credentials); // add the delegation tokens from configuration String [] nameNodes = conf.getStrings(MRJobConfig.JOB_NAMENODES); LOG.debug(adding the following namenodes' delegation tokens: + Arrays.toString(nameNodes)); if(nameNodes != null) { Path [] ps = new Path[nameNodes.length]; for(int i=0; i nameNodes.length; i++) { ps[i] = new Path(nameNodes[i]); } TokenCache.obtainTokensForNamenodes(credentials, ps, conf); } } {code} Add an interface to Input/Ouput Formats to obtain delegation tokens --- Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Michael Weng Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3 Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Resolution: Fixed Fix Version/s: 2.4.0 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to trunk, branch-1 and branch-2. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Fix For: 1.3.0, 2.4.0 Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863627#comment-13863627 ] Alejandro Abdelnur commented on MAPREDUCE-3310: --- just committed an addendum fixing javadoc warnings (apologies for the noise). Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Fix For: 1.3.0, 2.4.0 Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-branch-1.patch Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-trunk.patch new patches with the suggested method names changes. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856006#comment-13856006 ] Alejandro Abdelnur commented on MAPREDUCE-3310: --- test failure seems unrelated. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5632) TestRMContainerAllocator#testUpdatedNodes fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839226#comment-13839226 ] Alejandro Abdelnur commented on MAPREDUCE-5632: --- LGTM, +1 TestRMContainerAllocator#testUpdatedNodes fails --- Key: MAPREDUCE-5632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5632 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Eagles Attachments: YARN-1420.patch From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console : {code} Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) Time elapsed: 3.125 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779) {code} This assertion fails: {code} Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty()); {code} The List returned by allocator.getJobUpdatedNodeEvents() is: [EventType: JOB_UPDATED_NODES] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5652) ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836088#comment-13836088 ] Alejandro Abdelnur commented on MAPREDUCE-5652: --- BTW, the {{ShuffleHandler}} is not aware of the cleanup. The clean up is done in the {{ResourceLocalizationService.java}} {{serviceInit()}} method. ShuffleHandler should handle NM restarts Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Labels: shuffle ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Status: Patch Available (was: Open) Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-trunk.patch Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-trunk.patch test failure seems unrelated. uploading patch that fixes the javac warning (was in a testcase) Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-branch-1.patch patch for branch-1 Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-3310: -- Attachment: MAPREDUCE-3310-trunk.patch reuploading patch for trunk so jenkins do not pickup the branch-1 patch. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5481) Uber job reducers hang waiting to shuffle map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821683#comment-13821683 ] Alejandro Abdelnur commented on MAPREDUCE-5481: --- LTGM, just a couple of questions/comments: * LocalContainerLauncher.java (line:352 with patch), do we need to do something about it: {{ //relocalize(); // needed only if more than one reducer supported (is MAPREDUCE-434 fixed yet?)}} * LocalContainerLauncher.java, the introduced {{localMapFiles Map}}, from a cursory look it does not seem to be accessed from multiple threads, if so it is fine. Else we need to use a sync/concurrent map. Uber job reducers hang waiting to shuffle map outputs - Key: MAPREDUCE-5481 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 3.0.0 Reporter: Jason Lowe Assignee: Xuan Gong Priority: Blocker Attachments: MAPREDUCE-5481.patch, MAPREDUCE-5481.patch, syslog TestUberAM has been timing out on trunk for some time now and surefire then fails the build. I'm not able to reproduce it locally, but the Jenkins builds have been seeing it fairly consistently. See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5481) Uber job reducers hang waiting to shuffle map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821952#comment-13821952 ] Alejandro Abdelnur commented on MAPREDUCE-5481: --- LGTM +1 after jenkins. Uber job reducers hang waiting to shuffle map outputs - Key: MAPREDUCE-5481 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 3.0.0 Reporter: Jason Lowe Assignee: Xuan Gong Priority: Blocker Attachments: MAPREDUCE-5481-1.patch, MAPREDUCE-5481.patch, MAPREDUCE-5481.patch, syslog TestUberAM has been timing out on trunk for some time now and surefire then fails the build. I'm not able to reproduce it locally, but the Jenkins builds have been seeing it fairly consistently. See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5609) Add debug log message when sending job end notification
[ https://issues.apache.org/jira/browse/MAPREDUCE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815363#comment-13815363 ] Alejandro Abdelnur commented on MAPREDUCE-5609: --- +1 Add debug log message when sending job end notification --- Key: MAPREDUCE-5609 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5609 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.1 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-5609.patch Currently, it's hard to tell if the job end notification is working and if its backed up because you only see log messages if there was an error making the notification. It would be helpful to add a debug log message when the job end notification is sent. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5609) Add debug log message when sending job end notification
[ https://issues.apache.org/jira/browse/MAPREDUCE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5609: -- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) thanks Robert, committed to branch-1. Add debug log message when sending job end notification --- Key: MAPREDUCE-5609 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5609 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.1 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 1.3.0 Attachments: MAPREDUCE-5609.patch Currently, it's hard to tell if the job end notification is working and if its backed up because you only see log messages if there was an error making the notification. It would be helpful to add a debug log message when the job end notification is sent. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5457) Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators
[ https://issues.apache.org/jira/browse/MAPREDUCE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797086#comment-13797086 ] Alejandro Abdelnur commented on MAPREDUCE-5457: --- +1 LGTM Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators Key: MAPREDUCE-5457 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5457 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5457-1.patch, MAPREDUCE-5457-2.patch, MAPREDUCE-5457-3.patch, MAPREDUCE-5457-branch-1-1.patch, MAPREDUCE-5457-branch-1.patch, MAPREDUCE-5457.patch MR jobs sometimes want to just output lines of text, not key/value pairs. TextOutputFormat handles this by, if a null value is given, outputting only the key with no separator. Streaming jobs are unable to take advantage of this, because they can't output null values. A text output format reader takes each line as a key and outputs NullWritables for values would allow streaming jobs to output lines of text. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-3310: - Assignee: Alejandro Abdelnur Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5457) Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators
[ https://issues.apache.org/jira/browse/MAPREDUCE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792058#comment-13792058 ] Alejandro Abdelnur commented on MAPREDUCE-5457: --- LGTM, it would be good to have a testcase using streaming. Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators Key: MAPREDUCE-5457 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5457 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5457-branch-1.patch, MAPREDUCE-5457.patch MR jobs sometimes want to just output lines of text, not key/value pairs. TextOutputFormat handles this by, if a null value is given, outputting only the key with no separator. Streaming jobs are unable to take advantage of this, because they can't output null values. A text output format reader takes each line as a key and outputs NullWritables for values would allow streaming jobs to output lines of text. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5088) MR Client gets an renewer token exception while Oozie is submitting a job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783222#comment-13783222 ] Alejandro Abdelnur commented on MAPREDUCE-5088: --- This means that we need to have JHS HA, correct?] MR Client gets an renewer token exception while Oozie is submitting a job - Key: MAPREDUCE-5088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5088 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Roman Shaposhnik Assignee: Daryn Sharp Priority: Blocker Fix For: 2.0.4-alpha Attachments: HADOOP-9409.patch, HADOOP-9409.patch, MAPREDUCE-5088.patch, MAPREDUCE-5088.patch, MAPREDUCE-5088.txt After the fix for HADOOP-9299 I'm now getting the following bizzare exception in Oozie while trying to submit a job. This also seems to be KRB related: {noformat} 2013-03-15 13:34:16,555 WARN ActionStartXCommand:542 - USER[hue] GROUP[-] TOKEN[] APP[MapReduce] JOB[001-130315123130987-oozie-oozi-W] ACTION[001-130315123130987-oozie-oozi-W@Sleep] Error starting action [Sleep]. ErrorType [ERROR], ErrorCode [UninitializedMessageException], Message [UninitializedMessageException: Message missing required fields: renewer] org.apache.oozie.action.ActionExecutorException: UninitializedMessageException: Message missing required fields: renewer at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:738) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:889) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59) at org.apache.oozie.command.XCommand.call(XCommand.java:277) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.google.protobuf.UninitializedMessageException: Message missing required fields: renewer at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:605) at org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto$Builder.build(SecurityProtos.java:973) at org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.mergeLocalToProto(GetDelegationTokenRequestPBImpl.java:84) at org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.getProto(GetDelegationTokenRequestPBImpl.java:67) at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getDelegationToken(MRClientProtocolPBClientImpl.java:200) at org.apache.hadoop.mapred.YARNRunner.getDelegationTokenFromHS(YARNRunner.java:194) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:273) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:581) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:576) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:723) ... 10 more 2013-03-15 13:34:16,555 WARN ActionStartXCommand:542 - USER[hue] GROUP[-] TOKEN[] APP[MapReduce] JOB[001-13031512313 {noformat} -- This message was sent by Atlassian
[jira] [Commented] (MAPREDUCE-5544) JobClient#getJob loads job conf twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783230#comment-13783230 ] Alejandro Abdelnur commented on MAPREDUCE-5544: --- +1, LGTM JobClient#getJob loads job conf twice - Key: MAPREDUCE-5544 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5544 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5544-1.patch, MAPREDUCE-5544.patch Calling JobClient#getJob causes the job conf file to be loaded twice, once in the constructor of JobClient.NetworkedJob and once in Cluster#getJob. We should remove the former. MAPREDUCE-5001 was meant to fix a race that was causing problems in Hive tests, but the problem persists because it only fixed one of the places where the job conf file is loaded. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769854#comment-13769854 ] Alejandro Abdelnur commented on MAPREDUCE-5487: --- On my first comment, my bad, mistakenly thought JOB_CONF_FILE was mapred-site.xml, it is job.xml, the localized job. It is fine then. +1 In task processes, JobConf is unnecessarily loaded again in Limits -- Key: MAPREDUCE-5487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch Limits statically loads a JobConf, which incurs costs of reading files from disk and parsing XML. The contents of this JobConf are identical to the one loaded by YarnChild (before adding job.xml as a resource). Allowing Limits to initialize with the JobConf loaded in YarnChild would reduce task startup time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5379) Include token tracking ids in jobconf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-5379. --- Resolution: Fixed Fix Version/s: 2.1.1-beta Hadoop Flags: Reviewed Thanks Karthik. Committed to trunk, branch-2 and branch-2.1-beta. Include token tracking ids in jobconf - Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Fix For: 2.1.1-beta Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch HDFS-4680 enables audit logging delegation tokens. By storing the tracking ids in the job conf, we can enable tracking what files each job touches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768302#comment-13768302 ] Alejandro Abdelnur commented on MAPREDUCE-5487: --- Shouldn't {{Limits.init(job)}} be called after adding the mapred config as resource? Personally, I don't like constants that are not 'constants', that seems to be the case of these limits. I know this is not being introduced by this patch. I would change all code to use the methods and deprecate the constants. I'm OK with doing that in another patch though. In task processes, JobConf is unnecessarily loaded again in Limits -- Key: MAPREDUCE-5487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch Limits statically loads a JobConf, which incurs costs of reading files from disk and parsing XML. The contents of this JobConf are identical to the one loaded by YarnChild (before adding job.xml as a resource). Allowing Limits to initialize with the JobConf loaded in YarnChild would reduce task startup time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767170#comment-13767170 ] Alejandro Abdelnur commented on MAPREDUCE-5379: --- +1 LGTM Include token tracking ids in jobconf - Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch HDFS-4680 enables audit logging delegation tokens. By storing the tracking ids in the job conf, we can enable tracking what files each job touches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5483: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Robert, thanks Chuan Liu. Committed to trunk, branch-2 branch-2.1. revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta Attachments: MAPREDUCE-5483.patch MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5484) YarnChild unnecessarily loads job conf twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754010#comment-13754010 ] Alejandro Abdelnur commented on MAPREDUCE-5484: --- +1 LGTM YarnChild unnecessarily loads job conf twice Key: MAPREDUCE-5484 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5484 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: perfomance Attachments: MAPREDUCE-5484-1.patch, MAPREDUCE-5484.patch In MR task processes, a JobConf is instantiated with the same job.xml twice, once at the beginning of main() and once in configureTask. IIUC, the second instantiation is not necessary. These take time reading from disk and parsing XML. Removing the second instantiation shaved a second off the average map task time in a 1,000-map sleep job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-5362: - Assignee: Roman Shaposhnik all yours, thx clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5483) revert MAPREDUCE-5357
Alejandro Abdelnur created MAPREDUCE-5483: - Summary: revert MAPREDUCE-5357 Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5357) Job staging directory owner checking could fail on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751547#comment-13751547 ] Alejandro Abdelnur commented on MAPREDUCE-5357: --- FYI, opened MAPREDUCE-5483 to revert this JIRA. Job staging directory owner checking could fail on Windows -- Key: MAPREDUCE-5357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5357 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 3.0.0, 2.1.0-beta Attachments: MAPREDUCE-5357-trunk.patch In {{JobSubmissionFiles.getStagingDir()}}, we have following code that will throw exception if the directory owner is not the current user. {code:java} String owner = fsStatus.getOwner(); if (!(owner.equals(currentUser) || owner.equals(realUser))) { throw new IOException(The ownership on the staging directory + stagingArea + is not as expected. + It is owned by + owner + . The directory must + be owned by the submitter + currentUser + or + by + realUser); } {code} This check will fail on Windows when the underlying file system is LocalFileSystem. Because on Windows, the default file or directory owner could be Administrators group if the user belongs to Administrators group. Quite a few MR unit tests that runs MR mini cluster with localFs as underlying file system fail because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751558#comment-13751558 ] Alejandro Abdelnur commented on MAPREDUCE-5483: --- I guess we could do a check if the platform is windows to do the chown() but the fix was because testcases failing on windows when running them as admin. it seems fishy to me that Windows will fail silently chown(). Regardless, either we guard this code to run only on Windows or we revert it. I'd prefer reverting it. revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751578#comment-13751578 ] Alejandro Abdelnur commented on MAPREDUCE-5483: --- if you run builds in the same directory as different users you'll run into permission issues deleting files from previous run unless the user running the second time is a superuser. That seems a wrong thing to do. revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta Attachments: MAPREDUCE-5483.patch MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751580#comment-13751580 ] Alejandro Abdelnur commented on MAPREDUCE-5483: --- if we revert this patch you don't do a chown() in a dir you created. revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta Attachments: MAPREDUCE-5483.patch MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751603#comment-13751603 ] Alejandro Abdelnur commented on MAPREDUCE-5483: --- +1 from my side. [~chuanliu], are you OK with the revert? revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta Attachments: MAPREDUCE-5483.patch MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357
[ https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751681#comment-13751681 ] Alejandro Abdelnur commented on MAPREDUCE-5483: --- UGI and minicluster have support for adding test users which do not map to OS users. when using such test users things blow up in the local file system. Before MAPREDUCE-5357 (without the chown) thing were working fine in such scenarios. MAPREDUCE-5357 introduced a regression. I'm planning to commit the current tomorrow. If you want to do a special handling for Windows (which I would not recommend) please upload a patch. The patch should have the effect of a 'revert' for non Windows platforms. revert MAPREDUCE-5357 - Key: MAPREDUCE-5483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Robert Kanter Fix For: 2.1.1-beta Attachments: MAPREDUCE-5483.patch MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless you are superuser. if you a chown() to yourself is a NOP, that is why has not been detected in Hadoop testcases where user is running as itself. However, in distcp testcases run by Oozie which use test users/groups from UGI for minicluster it is failing because of this chown() either because the test user does not exist of because the current use does not have privileges to do a chown(). We should revert MAPREDUCE-5357. Windows should handle this with some conditional logic used only when running in Windows. Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5473) JT webservices use a static SimpleDateFormat, SImpleDateFormat is not threadsafe
Alejandro Abdelnur created MAPREDUCE-5473: - Summary: JT webservices use a static SimpleDateFormat, SImpleDateFormat is not threadsafe Key: MAPREDUCE-5473 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5473 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Alejandro Abdelnur MAPREDUCE-4837 is doing: {code} %!static SimpleDateFormat dateFormat = new SimpleDateFormat( d-MMM- HH:mm:ss); {code} But SimpleDateFormat is not thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739970#comment-13739970 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], I'm glad you are glad, *smile*. Yes, we agree we need to leave SLOT_MILLIS because Hadoop 0.23 users are relying on it. We also agreed before that we should remove 'minimum resource capability' from the protocol and the API because it is a scheduler implementation internal thing that should not be exposed to the users. The current code relies on using 'minimum resource capability' configuration property which is internal. We've also agreed on doing that until this JIRA is fixed. The proposed solution is a tweak to the current code just using a different configuration property, nothing else. Adding 'minimum resource capability' back to the protocol and API to support deprecated functionality that we all agree it should go away does not seem right. I prefer [~jlowe] suggestion to have a new ( deprecated) configuration property that users upgrading from 0.23 and wanting to preserve the SLOT_MILLIS counter information can use (and if they don't -default setting- SLOT_MILLIS is always zero). Also, doing the constant replacement is much simple that reintroducing the protocol and API minimum field. Lets move forward with this. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740074#comment-13740074 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~bikassaha], you are on the spot. The only difference being proposed is that instead of using the scheduler MIN property directly, to define a new one for this particular use case in the MRAM namepsace (and deprecate it). The reason for doing this is that the exiting scheduler MIN propery would go away per YARN-1004. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740397#comment-13740397 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], there are other 4 committers that are OK with the idea of a separate config to enable SLOT_MILLIS because it is a special usecase for 0.23 users. The changes are must less disruptive and (IMO) adequate given the usecase. Can you please reconsider your -1? Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740646#comment-13740646 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- Given than this is a legacy thing coming from Hadoop 1, I don't think we should use at all YARN constants properties. Why we don't use the Hadoop 1 JT properties for the same, in the mapred-site.xml documenting how they have to be set to the MIN of the scheduler for SLOT_MILLIS counter to kick in? To me this seem a much more correct way of doing it. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738516#comment-13738516 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], it seems we are talking pass each other here. * 1. SLOT_MILLIS does make sense in YARN. yes/no? * 2. We need to redefine what SLOT_MILLIS means/reports in YARN. yes/no? Are we in agreement that the answers to these questions is #1 NO and #2 YES. If we are in agreement, then we have to see how to address this in the least disruptive way. Sandy's latest proposal suggests we do the following: * Introduce the concept of CONTAINER_MILLIS (regardless of the container size) * Deprecate SLOT_MILLIS and map it to report CONTAINER_MILLIS And we could later augment this with additional counters: * Introduce the concept of MEMORY_MILLIS * Introduce the concept of CPU_MILLIS Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739043#comment-13739043 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~jlowe], thanks for jumping in, I was no aware you guys have in-house tools using this stuff with the current reported SLOT_MILLIS. With that in mind, and along the lines of what Jason proposes would the following satisfy all parties? This JIRA would then be repurposed to: * Introduce and deprecate a {{MRConf.SLOT_MILLIS_MINIMUM_ALLOCATION_MB}} with no default * If set, use the {{MRConf.SLOT_MILLIS_MINIMUM_ALLOCATION_MB}} value to compute, using today's logic, the SLOT_MILLIS counter values. * If no set, SLOT_MILLIS counter should report 0. This means than anybody relying on current SLOT_MILLIS reporting can continue getting it until we decide to trash it (a few versions down the road). A different JIRA would introduce MEM_MILLIS and CPU_MILLIS which have an accurate meaning in Yarn's world. YARN-1004 is then unblocked. I believe this address the problem without breaking backwards compatibility as [~acmurthy] asked. Arun, Jason, [~sandyr], are you OK with this approach? Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735011#comment-13735011 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], I don't see how this will break BC, it is not being proposed removing the counter, but either making it zero or as [~sandyr] suggested introduced CONTAINER_MILLIS_MAP and map SLOT_MILLIS_MAP to it (approach that will make more sense than the current value). I don't want to punt because you are blocking YARN-1004 because of this one. YARN-1004 should go in 2.1.0-beta. Please ping me if you want to chat offline over the phone if you think will be easier to discussed it. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5311: -- Fix Version/s: 2.1.0-beta [~acmurthy], I guess you removing it from 2.1.0-beta and my last comment had a race condition, making it blocker for 2.1.0-beta again. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731067#comment-13731067 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], unless I got things wrong, we agreed to keep slot-millis around until we have memory-millis. And the latest patch here is doing that. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5311: -- Priority: Blocker (was: Major) Fix Version/s: 2.1.0-beta We need to take care of this for 2.1.0, making it a blocker. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Remove slot millis computation logic and deprecate counter constants
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728327#comment-13728327 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], per [~jlowe]'s comment above so all that's left is to clarify what will happen to slot-millis once mem-millis shows up. he does not seem unhappy. I think the problem we have at hand here is that there is a legacy metrics, slots-millis, which does not make sense in YARN regardless what it returns 0 or mem-millis. Anybody relying on this value coming from Hadoop 1 will get something completely different to what was getting when running in Hadoop 1. This means that if we make it disappear or we leave around returning the wrong value (a funny mem-millis based on MIN config now), current users relying on it will have to adjust how they see/process this value. Because of that, I would say, we just bite the problem, and we make the slot-millis counter to return 0 (or better -1), deprecate the constants, print a warning when somebody uses this constant indicating the user to user look for memory-millis (and eventually cpu-millis). Remove slot millis computation logic and deprecate counter constants Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726830#comment-13726830 ] Alejandro Abdelnur commented on MAPREDUCE-5379: --- I've played around and got Daryn's patch to work. After running the patch by Andrew Wang (who is doing HDFS-4680) he brought up a a concern with the client driven tracking approach, a client can set a rogue trackingId. But with the sequenceId approach what is in the HDFS audit can fully trusted and tracked to a user. One concern Daryn mentioned above with the sequenceId approach was, and also told me offline, the MR client decoding the token identifier, this could break things when moving token encoding from writable to protobuff. To address this, instead of the MR client decoding the token identifier it would simply do a hash of its byte[] representation without decoding it. In addition, the MR client should have an option to switch ON/OFF(default) the DT hash generation/injection in the jobconf. Include FS delegation token ID in job conf -- Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, MAPREDUCE-5379.patch Making a job's FS delegation token ID accessible will allow external services to associate it with the file system operations it performs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5426) MRAM fails to register to RM, AMRM token seems missing
Alejandro Abdelnur created MAPREDUCE-5426: - Summary: MRAM fails to register to RM, AMRM token seems missing Key: MAPREDUCE-5426 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5426 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.0-beta trying to run the pi example in an unsecure pseudo cluster the job fails. It seems the AMRM token is MIA. The AM syslog have the following: {code} 2013-07-27 14:17:23,703 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:176) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:95) at com.sun.proxy.$Proxy29.registerApplicationMaster(Unknown Source) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:147) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:107) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:789) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN] at org.apache.hadoop.ipc.Client.call(Client.java:1369) at org.apache.hadoop.ipc.Client.call(Client.java:1322) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) ... 22 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces
[ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-4366. --- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Thanks Sandy. Committed to branch-1. mapred metrics shows negative count of waiting maps and reduces --- Key: MAPREDUCE-4366 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-4366-branch-1-1.patch, MAPREDUCE-4366-branch-1.patch Negative waiting_maps and waiting_reduces count is observed in the mapred metrics. MAPREDUCE-1238 partially fixed this but it appears there is still issues as we are seeing it, but not as bad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718591#comment-13718591 ] Alejandro Abdelnur commented on MAPREDUCE-5379: --- +1 from my side, IMO [~daryn] concerns have been addressed, [~daryn]? Include FS delegation token ID in job conf -- Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379.patch Making a job's FS delegation token ID accessible will allow external services to associate it with the file system operations it performs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718592#comment-13718592 ] Alejandro Abdelnur commented on MAPREDUCE-5379: --- [~daryn], happy to have a call if you want to quickly this discuss this, then I'll summarize the offline discussions here. Include FS delegation token ID in job conf -- Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379.patch Making a job's FS delegation token ID accessible will allow external services to associate it with the file system operations it performs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5288) ResourceEstimator#getEstimatedTotalMapOutputSize suffers from divide by zero issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5288: -- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Karthik. Thanks Harsh for reviewing it. Committed to branch-1. ResourceEstimator#getEstimatedTotalMapOutputSize suffers from divide by zero issues --- Key: MAPREDUCE-5288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5288 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Harsh J Assignee: Karthik Kambatla Fix For: 1.3.0 Attachments: mr-5288-1.patch The computation in the above mentioned class-method is below: {code} long estimate = Math.round(((double)inputSize * completedMapsOutputSize * 2.0)/completedMapsInputSize); {code} Given http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double), its possible that the returned estimate could be Long.MAX_VALUE if completedMapsInputSize is determined to be zero. This can be proven with a simple code snippet: {code} class Foo { public static void main(String... args) { long inputSize = 600L + 2; long estimate = Math.round(((double)inputSize * 1L * 2.0)/0L); System.out.println(estimate); } } {code} The above conveniently prints out: {{9223372036854775807}}, which is Long.MAX_VALUE (or 8 Exbibytes per MapReduce). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira