from:"Alejandro Abdelnur \(JIRA\)"

[jira] [Created] (MAPREDUCE-6101) on job submission, if input or output directories are encrypted, shuffle data should be encrypted at rest

2014-09-19 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-6101:
-

 Summary: on job submission, if input or output directories are 
encrypted, shuffle data should be encrypted at rest
 Key: MAPREDUCE-6101
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6101
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.6.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh


Currently setting shuffle data at rest encryption has to be done explicitly to 
work. If not set explicitly (ON or OFF) but the input or output HDFS 
directories of the job are in an encrption zone, we should set it to ON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6060) shuffle data should be encrypted at rest if the input/output of the job are in an encryption zone

2014-08-28 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-6060:
-

 Summary: shuffle data should be encrypted at rest if the 
input/output of the job are in an encryption zone
 Key: MAPREDUCE-6060
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6060
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


If the input or output of an MR job are within an encryption zone, by default 
the intermediate data of the job should be encrypted.

Setting the {{MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA}} property explicitly 
should override the default behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-10 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-5890.
---

   Resolution: Fixed
Fix Version/s: fs-encryption
 Hadoop Flags: Reviewed

I've just committed this JIRA to fs-encryption branch.

[~chris.douglas], thanks for all the review cycles you spent on this.

[~asuresh], thanks for persevering until done, nice job.


 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Fix For: fs-encryption

 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.14.patch, 
 MAPREDUCE-5890.15.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-07 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053828#comment-14053828
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

[~chris.douglas], thanks for the detailed feedback/review iterations on this. 
Does this means you are OK with committing the current patch?

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048355#comment-14048355
 ] 

Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 7/1/14 4:50 AM:
---

[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. 
The reason I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV 
and (thus make it transparent to the rest of the code-base). The reading 
code-path is not so straight-forward. There are two classes that extend the 
{{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The 
{{InMemoryReader}} totally ignores the inputStream that is initialized in the 
base class constructor and there are places in the codeBase that the input 
stream is not initialized in the Reader but in the {{Segment::init()}} method 
(which in my opinion makes the {{IFile}} abstraction a bit leaky since the 
underlying stream should be handled in its entirity in the IFile 
Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} 
framework) should avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle 
phase... (another instance of leaky abstraction mentioned in the previous 
point), the implementations of {{MapOutput::shuffle}} method creates 
{{IFileInputStreams}} directly without an associated {{IFile.Reader}}


was (Author: asuresh):
[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. 
The reason I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV 
and (thus make it transparent to the rest of the code-base). The reading 
code-path is not so straight-forward. There are two classes that extend the 
{{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The 
{{InMemoryReader}} totally ignores the inputStream that is initialized in the 
base class constructor and there are places in the codeBase that the input 
stream is not initialized in the Reader but in the {{Segment::init()}} method 
(which in my opinion makes the {{IFile}} abstraction a bit leaky since the 
underlying stream should be handled in its entirity in the IFile 
Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} 
framework) should avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle 
phase... (another instance of leaky abstraction mentioned in the previous 
point), the implementations of {{MapOutput::shuffle}} method creates 
{{IFileInputStream}}s  directly without an associated {{IFile.Reader}}

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-28 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046863#comment-14046863
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

LGTM. 

One minor nit (i can take care of it when committing), in 
{{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed 
change.

[~chris.douglas], I believe all our suggestions/concerns have been addressed. 
Do you want to do a new pass on the patch?

I'll wait a few days to commit.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-28 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046863#comment-14046863
 ] 

Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 6/29/14 2:13 AM:


LGTM. 

One minor nit (i can take care of it when committing), in 
{{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed 
change.

[~chris.douglas], I believe all your suggestions/concerns have been addressed. 
Do you want to do a new pass on the patch?

I'll wait a few days to commit.


was (Author: tucu00):
LGTM. 

One minor nit (i can take care of it when committing), in 
{{JObSubmitter.java#copyAndConfigureFiles()}} javadoc, line 295 no needed 
change.

[~chris.douglas], I believe all our suggestions/concerns have been addressed. 
Do you want to do a new pass on the patch?

I'll wait a few days to commit.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-26 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044665#comment-14044665
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

On the performance hit, if encryption is OFF I would say it is NILL (the only 
extra thing being don is resolving a boolean config to check if encryption is 
ON or OFF). if encryption is ON, you are hitting the encryption/decryption 
overhead. Doing prelimiaries encrytion benchmarks with the crypto streams using 
Diceros (CryptoCodec-JCE-JNI-OpenSSL) I've got 1000MB/sec both on 
encrypt/decrypt on my laptop. Once we have HADOOP-10693 and this JIRA, will be 
able to do some end to end benchmarks.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, 
 MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, 
 MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-25 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043884#comment-14043884
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

Fetcher.java
MapTask.java
MergerManagerImpl.java
Merger.java
ShuffleHandler.java
ShuffleHeader.java

* several space changes (configure your editor not to trim unmodified lines

CryptoUtils.java

* createIV(): javadocs, invalid params
* wrap() OUT/IN methods: any change to consolidate all/most signatures to 
delegate to a single one doing the repetitive logic?
* a couple wrap() methods have a funny LOG message 
* wrap() OUT methods use cc.AlgorithmBlockSize(), but wrap() IN methods use 16, 
for IN methods you can use the cc already avail in the method.
* wrap() methods wrap if necessary (the IF ENCRYTPED has been moved inside), 
the name should reflect that, maybe something like 'wrapIfNecessary()'

Fetcher.java

* copyMapOutput() is unconditionally correct the offset, this seems wrong.

* No need to define out2, just reuse out


 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-25 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043900#comment-14043900
]

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

bq. If this really is a requirement, aren't we better off asking cluster admins
to either install disks with local file-systems that support encryption
specifically for intermediate data or just create some partitions that support
encryption? That seems like the right layer to handle something like this
instead of adding a whole lot of complexity into the software that only has a
downside of performance.

Asking to install additional soft to encrypt local FS means installing Kernel
modules.

Also, this would mean that ALL MR jobs are going to pay the penalty of
encrypted intermediate data. That is not reasonable.

I don't agree on the statement that this is adding a lot of complexity, it is
simply wrapping the streams where necessary.

bq. Wearing my YARN hat, it is not enough to do this just for MapReduce. Every
other framework running on YARN will need to add this complexity - this is
asking for too much complexity. We are better off handling it at the
file-system/partition/disk level.

This patch is not touching anything in Yarn, but in MapReduce, private/evolving
classes of it.

Support for encrypting Intermediate data and spills in local filesystem
---

Key: MAPREDUCE-5890
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
Labels: encryption
Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch,
MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch,
MAPREDUCE-5890.8.patch,
org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt,
syslog.tar.gz

For some sensitive data, encryption while in flight (network) is not
sufficient, it is required that while at rest it should be encrypted.
HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem
using Hadoop FileSystem API. MapReduce intermediate data and spills should
also be encrypted while at rest.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042373#comment-14042373
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

Hi [~chris.douglas],

I would prefer to keep the current MR job test because it test spills/merges on 
both sides of the MR job making sure no edge cases are not covered.

The {{ShuffleHandler}} is a private class of MapReduce, if other frameworks use 
it, it is at their own risk.

Regarding adding new abstractions, I’m OK if they are small and non-intrusive. 
I just don’t want to send Arun chasing a goose a wild goose and when he finally 
does we backtrack because the changes are too pervasive in the core of 
MapReduce (this happened in MAPREDUCE-2454).


 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042768#comment-14042768
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

[~chris.douglas], on the last section of the previous comment. I didn't mean to 
say your refactoring asks are a wild goose, I just wanted to say I don't want 
to end up on that situation. My apologies if I've given the wrong impression 
with my comment. I've talked with Arun and he is already exploring along the 
lines of your suggestions to see their feasibility. 


 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-20 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039671#comment-14039671
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

LGTM. [~asuresh], can you run test-patch locally on the patch and paste the 
result in the JIRA? After that, I think we are good to go.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-18 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036684#comment-14036684
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

Suresh, any special reason why the test is not included in the main patch?

I’m not quite happy with the IF blocks scattered around:

{code}
  if (CryptoUtils.isShuffleEncrypted(conf)) {
byte[] iv = CryptoUtils.createIVFile(conf, fs, file);
out = CryptoUtils.wrap(conf, iv, out);
  }
{code}

Given that current abstraction does not provide a clean cut to hide this within 
the {{IFile}} without a significant refactoring throughout the code, I think is 
the least evil.

Nice job.

Could you try running test-patch locally on the fs-encryption branch with this 
patch?


 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.2.patch, MAPREDUCE-5890.test.patch


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-17 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-5890:
-

Assignee: Arun Suresh  (was: Alejandro Abdelnur)

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh

 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-2608) Mavenize mapreduce contribs

2014-05-16 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-2608.
---

Resolution: Invalid

[doing self-clean up of JIRAs] closing as invalid as this has been done in 
different jiras.

 Mavenize mapreduce contribs
 ---

 Key: MAPREDUCE-2608
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2608
 Project: Hadoop Map/Reduce
  Issue Type: Task
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 Same as HADOOP-6671 for mapreduce contribs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-05-15 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5890:
-

 Summary: Support for encrypting Intermediate data and spills in 
local filesystem
 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


For some sensitive data, encryption while in flight (network) is not 
sufficient, it is required that while at rest it should be encrypted. 
HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem using 
Hadoop FileSystem API. MapReduce intermediate data and spills should also be 
encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-4658) Move tools JARs into separate lib directories and have common bootstrap script.

2014-05-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-4658.
---

Resolution: Won't Fix

[doing self-clean up of JIRAs] scripts have change significantly since this 
JIRA.

 Move tools JARs into separate lib directories and have common bootstrap 
 script.
 ---

 Key: MAPREDUCE-4658
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4658
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 This is a follow up of the discussion going on on MAPREDUCE-4644
 --
 Moving each tools JARs into separate lib/ dirs it is quite easy (modifying a 
 single assembly). What we should think is a common bootstrap script for that 
 so each tool does not have to duplicate (and get wrong) such script. I'll 
 open a JIRA for that.
 --



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-05-15 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998426#comment-13998426
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5890:
---

 HADOOP-10603 introduces crypto streams to be used by for filesystem 
encryption. We could leverage it for encrypting map output data, the Reducer 
shuffle would decrypt it (no need for network encryption as data would be 
encrypted in transit). The reducer, when writing spills to disk woudl encrypt 
and it would decrypt while reading the spills.

It may make sense to do this JIRA as part of fs-encryption branch.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901887#comment-13901887
]

Alejandro Abdelnur commented on MAPREDUCE-5641:
---

[~rkanter], [~jlowe], how about not touching the current permissions of stating
and making the RM a proxy user in HDFS. Then the files would be written as the
user.

[~vinodkv], I'm a bit reluctant to get the JHS to depend on the AHS at this
point as the AHS is not fully cooked. I would prefer dropping the JHS
alltogether in favor of the AHS when the AHS is ready for prime time with AM
extensions.

History for failed Application Masters should be made available to the Job
History Server
-

Key: MAPREDUCE-5641
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: applicationmaster, jobhistoryserver
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Attachments: MAPREDUCE-5641.patch

Currently, the JHS has no information about jobs whose AMs have failed. This
is because the History is written by the AM to the intermediate folder just
before finishing, so when it fails for any reason, this information isn't
copied there. However, it is not lost as its in the AM's staging directory.
To make the History available in the JHS, all we need to do is have another
mechanism to move the History from the staging directory to the intermediate
directory. The AM also writes a Summary file before exiting normally,
which is also unavailable when the AM fails.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901985#comment-13901985
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5641:
---

yep, I've meant that. The JHS is trusted code, no user code running there. The 
doAs with the proxy user would be used only for this case. Also, all this would 
go away when the AHS is ready to take over. 

 History for failed Application Masters should be made available to the Job 
 History Server
 -

 Key: MAPREDUCE-5641
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, jobhistoryserver
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: MAPREDUCE-5641.patch


 Currently, the JHS has no information about jobs whose AMs have failed.  This 
 is because the History is written by the AM to the intermediate folder just 
 before finishing, so when it fails for any reason, this information isn't 
 copied there.  However, it is not lost as its in the AM's staging directory.  
 To make the History available in the JHS, all we need to do is have another 
 mechanism to move the History from the staging directory to the intermediate 
 directory.  The AM also writes a Summary file before exiting normally, 
 which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies

2014-01-25 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882112#comment-13882112
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5362:
---

patch applies cleanly on trunk's HEAD and builds correctly. Don't know what 
problem Jenkins is having

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-5362.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (MAPREDUCE-5362) clean up POM dependencies

2014-01-24 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-5362:
-

Assignee: Alejandro Abdelnur  (was: Roman Shaposhnik)

[~rvs], I'm stealing this from you. I have a few avail cycles and I want to 
nail this one

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies

2014-01-24 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5362:
--

Attachment: MAPREDUCE-5362.patch

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-5362.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies

2014-01-24 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5362:
--

Target Version/s:   (was: )
  Status: Patch Available  (was: Open)

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-5362.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies

2014-01-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881653#comment-13881653
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5362:
---

[~rvs], [~vinodkv], [~ste...@apache.org], [~kkambatl], this patch is the 
equivalent of YARN-888 for MR. Mind taking it for a spin?

It also does a few fixes on things that were not 100% correct:

* produces test jars for all MR modules
* puts all test jars in the MR test dir
* puts all source jars in the MR sources dir
* lib has all the direct 3rd party dependencies of MR


 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-5362.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-16 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

committed to trunk and branch-2.

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.4.0

 Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, 
 MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
   at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
   at

[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2014-01-16 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874157#comment-13874157
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3310:
---

If Tez is reusing all Hadoop MR task impl stuff the answer to it would yes, 
otherwise the new method would not be used at all and it doesn't matter what it 
returns.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Fix For: 1.3.0, 2.4.0

 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-16 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874169#comment-13874169
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

From what I understand, For HBase, or HDFS HA  Yarn HA, it is the 
corresponding client library the one that resolves the real host, so this 
would be taken care by the use of it (of the client library, hbase, hdfs, 
yarn) from within the {{CredentialsProvider}} implementation for that service. 
I think an {{URI[]}} (all of the same scheme being passed to a the 
corresponding {{CredentialsProvider}} impl should be enough, no?

 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-15 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872404#comment-13872404
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

planning to comment later this morning. sorry yesterday got caught on diff 
things. 

 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-15 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872542#comment-13872542
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

bq. ... I’m not too sure about - mainly from the perspective of services not 
handling getToken requests correctly if security is disabled

We are moving away from this, in Yarn we always use tokens, regardless of the 
security configuration. Oozie needs tokens to be there in order to work 
correctly.

bq. ... The JobClient currently doesn't do this, at least for HDFS.

Actually, yes it does do this if you set the {{MRJobConfig.JOB_NAMENODES}} 
property, this is done in the {{JobSubmitter#populateTokenCache()}} method 
which is called by {{JobSubmitter#submitJobInternal()}} which is called by 
{{JobSubmitter#submit()}}. All this is done in the main execution path, thus 
always done when doing a submit. It is independent of split computations.

bq. ... For HBase / HCatalog sources which are outside of the IF/OF for a MR 
job - I don't think we have the capability for fetching tokens, and rely on the 
user providing them up front.

Actually, we are fetching them upfront only because this was needed for MR 
jobs, but MR shouldn’t be a special case. Oozie has the concept of 
{{CredentialsProvider}} for this very same reason. And I think with this JIRA 
we can fix this in a general case.

bq. ... Would this utility class know how to handle all kinds of URIs ?

Yes, based on registered handlers for different schemes, more on this follows.


My thinking on how to address this is to use the same pattern we are doing 
today for loading/registering {{FileSystem}}, {{CompressionCodec}}, 
{{TokenRenewers}}, {{SecurityInfo}} implementations. Using JDK’s 
{{ServiceLoader}} mechanism to load all available implementations of the 
following interface:

{code}
/**
 * Implementations must be thread-safe.
 */
public interface CredentialsProvider {

 /**
  * Reports the scheme being supported by this provider.
  */
 public String getScheme();

 /**
  * Obtains delegations tokens for the provided URIs.
  *
  * @param conf configuration used to initialize the components that connect to 
the specified URIs.
  * @param uris URIs of services to obtain delegation tokens from.
  * @ param targetCredentials credentials to add the fetched delegation tokens.
  */
 public void obtainCredentials(Configuration conf, URI[] uris, Credentials 
targetCredentials) throws IOException;
{code}

Then we would have a {{CredentialsProvider}} class that would use a 
{{ServiceLoader}} to load all credentials available in the classpatch (via the 
ServiceLoader mechanism, the nice thing about this is that you drop a JAR file 
with a service implementation and you don’t have to configure anything, it just 
works provided you have the META-INF/services/... file for it). This would be 
done in a class static block initialization.

the {{CredentialsProvider}} would have a static method 
{{fetchCredentials(Configuration, URI[], Credentials)}} which sorts out the 
URIs by scheme and then invokes the corresponding {{CredentialsProvider}} impl 
for it.

Then the different Yarn applications define a property in the conf to indicate 
the URIs of the services to get tokens and their client submission code does it 
(like the {{JobSubmitter}} does with {{MRJobConfig.JOB_NAMENODES}} but in a 
general way. Frameworks may chose to be smarter (in the case of MR get the URIS 
from the splits an the output dir and get the tokens automatically).


 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-5724:
-

Assignee: Alejandro Abdelnur

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical

 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
   at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
--

Status: Patch Available  (was: Open)

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Attachments: MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
   at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
--

Attachment: MAPREDUCE-5724.patch

trying to do something like YARN-24 for JHS is a bit more complicated.

Instead, I've taken a different approach:

On startup the JHS will try creating the history directories, if it cannot 
because the the FS is not available or in safemode will retry for up to 2mins, 
if it times out, it will then shutdown.

So, instead failing immediately, the JHS will wait for the FS to become avail 
for a while.

I've hardcoded the 2mins timeout as I don't think we need to introduce a config 
value for this. If others feel otherwise, I can update the patch with a config 
prop for it.

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Attachments: MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
--

Attachment: MAPREDUCE-5724.patch

thanks for the reviews.

New patch addressing Karthik's and Sandy's comments.

Regarding removing the {{throw Exception}} from the {{createHistoryDirs()}}, 
not possible because the {{tryCreateHistoryDirs}} does throw a checked 
exception if the reason is other than the FS not being avail.

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
--

Attachment: MAPREDUCE-5724.patch

Thanks Sandy, new patch changing the exception being thrown to IOException. 
Regarding detecting the SafeModeException by cause, I've tried that at first, 
the problem is that the cause is NULL. I've checked with ATM and he indicated 
that the initCause() method should be called, but according to the javadocs, 
the initCause() should be called where the exception is being created, so this 
seems to be an HDFS issue, thus the only way I figured out how to determine if 
the original exception was due to the filesystem being in safemode was by 
searching the toString() value for 'SafeModeException'. I'll open a JIRA 
against HDFS to call initCause() where the exception is being thrown.

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, 
 MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at

[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-15 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873023#comment-13873023
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5724:
---

created HDFS-5787

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, 
 MAPREDUCE-5724.patch


 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
   at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
   at

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-14 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870835#comment-13870835
]

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

The Oozie server is responsible for obtaining all the tokens the main job may
need:

* tokens to run the job (working dir, jobtokens)
* tokens for the Input and Output data (typically HDFS tokens, but they can be
for different file systems, for Hbase, for HCatalog, etc).

For the typical case of running an MR job (directly or via Pig/Hive), the
tokens of launcher job are sufficient for the main job. They just need to be
propagated. The Oozie server makes sure the
mapreduce.job.complete.cancel.delegation.tokens property is set to FALSE for
the launcher job (Oozie gets rid of the launcher job for MR jobs once the main
job is running).

For scenarios where the main job needs to interact with different services,
Oozie must acquire them in advance. For HDFS this is done by simply setting the
MRJobConfig.JOB_NAMENODES property, then the launcher job submission will get
those tokens. For Hbase or HCatalog, Oozie has a CredentialsProvider that
obtains those tokens (the requirement here is that Oozie is configured as proxy
user in those services in order to get tokens for the user submitting the job).

From what it seems you are after generalizing this. If think we should do it
with a slightly twist from what you are proposing:

* DelegationTokens should be always requested by the client, security enabled
or not, computing the splits on the client or not.
* DelegationTokens fetching should be done regardless of the IF/OF
implementation (take the case of talking with Hbase or HCatalog, job working
dir service).
* DelegationTokens fetching should not be tied to split computation.

We could have a utility class that we pass a UGI, list of service URIs and
returns a populated Credentials with tokens for all the specified services.

The IF/OF/Job would have to be able to extract the required URIs for the job.

Also, this mechanism could be used to obtain ALL tokens the AM needs.

Add an interface to Input/Ouput Formats to obtain delegation tokens
---

Key: MAPREDUCE-5663
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt,
MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2,
MAPREDUCE-5663.patch.txt3

Currently, delegation tokens are obtained as part of the getSplits /
checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
This works as long as the splits are generated on a node with kerberos
credentials. For split generation elsewhere (AM for example), an explicit
interface is required.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-14 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5724:
-

 Summary: JobHistoryServer does not start if HDFS is not running
 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Priority: Critical


Starting JHS without HDFS running fails with the following error:

{code}
STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
2014-01-14T22:40Z
STARTUP_MSG:   java = 1.7.0_45
/
2014-01-14 16:47:40,264 INFO 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
handlers for [TERM, HUP, INT]
2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable
2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
JobHistory Init
2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; 
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating 
done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
localhost:8020 failed on connection exception: java.net.ConnectException: 
Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102)
at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561)
at

[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

2014-01-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871450#comment-13871450
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5724:
---

YARN-24 fixed a similar issue for the NM, we should try doing something similar 
here.

 JobHistoryServer does not start if HDFS is not running
 --

 Key: MAPREDUCE-5724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Priority: Critical

 Starting JHS without HDFS running fails with the following error:
 {code}
 STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
 ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
 2014-01-14T22:40Z
 STARTUP_MSG:   java = 1.7.0_45
 /
 2014-01-14 16:47:40,264 INFO 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
 handlers for [TERM, HUP, INT]
 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
 JobHistory Init
 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
 INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
 creating done directory: 
 [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
 directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
 Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
 localhost:8020 failed on connection exception: java.net.ConnectException: 
 Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
   at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
   at

[jira] [Created] (MAPREDUCE-5722) client-app module failing to compile, missing jersey dependency

2014-01-13 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5722:
-

 Summary: client-app module failing to compile, missing jersey 
dependency
 Key: MAPREDUCE-5722
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5722
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.4.0


This seems a fallout of YARN-888, oddly enough it did not happen while doing a 
full build with the patch before committing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (MAPREDUCE-5722) client-app module failing to compile, missing jersey dependency

2014-01-13 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-5722.
---

Resolution: Invalid

false alarm, it seems I was picking up some stale POMs from my local cache, 
doing a full clean build when OK.

 client-app module failing to compile, missing jersey dependency
 ---

 Key: MAPREDUCE-5722
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5722
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.4.0


 This seems a fallout of YARN-888, oddly enough it did not happen while doing 
 a full build with the patch before committing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869968#comment-13869968
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

[~sseth], [~acmurthy], why this is needed, if the AM has the corresponding 
delegation tokens, things work just fine, Oozie has been doing this for years; 
the splits are computed in the launcher job which does not have kerberos 
credentials.

 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869968#comment-13869968
 ] 

Alejandro Abdelnur edited comment on MAPREDUCE-5663 at 1/13/14 8:59 PM:


[~sseth], [~acmurthy], why is this needed? if the AM has the corresponding 
delegation tokens, things work just fine, Oozie has been doing this for years; 
the splits are computed in the launcher job which does not have kerberos 
credentials.


was (Author: tucu00):
[~sseth], [~acmurthy], why this is needed, if the AM has the corresponding 
delegation tokens, things work just fine, Oozie has been doing this for years; 
the splits are computed in the launcher job which does not have kerberos 
credentials.

 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870200#comment-13870200
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

[~sseth], for MR this is fully cooked.

It works something like this:

* On the AM client side you collect all the tokens you need and write them to 
HDFS using the Credentials.writeTokenStorageFile() method to HDFS.
* the HADOOP_TOKEN_FILE_LOCATION env variable pointing to such file is set to 
the AM environment.
* Then when calling UGI.getLoginUser() on the AM, the UGI credentials should be 
populated with the contents of the token file writen by the AM client.


 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870274#comment-13870274
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

This is done in the MR {{JobSubmitter.java}}, in the {{submitJobInternal(...)}} 
method:

{code}
  // get delegation token for the dir
  TokenCache.obtainTokensForNamenodes(job.getCredentials(),
  new Path[] { submitJobDir }, conf);

  populateTokenCache(conf, job.getCredentials());
{code}

Is this what you are after?

 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2014-01-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870375#comment-13870375
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
---

This works out of the box for MR jobs because typically the same FileSystem 
where the IN/OUT data resides is he one used for the submission dir.

If you need to use different FileSystems (i.e. distcp), this is achieved 
setting the {{MRJobConfig.JOB_NAMENODES}} property in the job confguration, 
this is handled in the {{JobSubmitter.java}} in the following code:

{code}
  //get secret keys and tokens and store them into TokenCache
  private void populateTokenCache(Configuration conf, Credentials credentials) 
  throws IOException{
readTokensFromFiles(conf, credentials);
// add the delegation tokens from configuration
String [] nameNodes = conf.getStrings(MRJobConfig.JOB_NAMENODES);
LOG.debug(adding the following namenodes' delegation tokens: + 
Arrays.toString(nameNodes));
if(nameNodes != null) {
  Path [] ps = new Path[nameNodes.length];
  for(int i=0; i nameNodes.length; i++) {
ps[i] = new Path(nameNodes[i]);
  }
  TokenCache.obtainTokensForNamenodes(credentials, ps, conf);
}
  }
{code}


 Add an interface to Input/Ouput Formats to obtain delegation tokens
 ---

 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Michael Weng
 Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
 MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
 MAPREDUCE-5663.patch.txt3


 Currently, delegation tokens are obtained as part of the getSplits / 
 checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
 This works as long as the splits are generated on a node with kerberos 
 credentials. For split generation elsewhere (AM for example), an explicit 
 interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2014-01-06 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
   1.3.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

committed to trunk, branch-1 and branch-2.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Fix For: 1.3.0, 2.4.0

 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2014-01-06 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863627#comment-13863627
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3310:
---

just committed an addendum fixing javadoc warnings (apologies for the noise).

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Fix For: 1.3.0, 2.4.0

 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-12-23 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-branch-1.patch

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-12-23 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-trunk.patch

new patches with the suggested method names changes.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-12-23 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856006#comment-13856006
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3310:
---

test failure seems unrelated.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5632) TestRMContainerAllocator#testUpdatedNodes fails

2013-12-04 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839226#comment-13839226
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5632:
---

LGTM, +1

 TestRMContainerAllocator#testUpdatedNodes fails
 ---

 Key: MAPREDUCE-5632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5632
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Eagles
 Attachments: YARN-1420.patch


 From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console :
 {code}
 Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) 
  Time elapsed: 3.125 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779)
 {code}
 This assertion fails:
 {code}
 Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty());
 {code}
 The List returned by allocator.getJobUpdatedNodeEvents() is:
 [EventType: JOB_UPDATED_NODES]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5652) ShuffleHandler should handle NM restarts

2013-12-01 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836088#comment-13836088
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5652:
---

BTW, the {{ShuffleHandler}} is not aware of the cleanup. The clean up is done 
in the {{ResourceLocalizationService.java}} {{serviceInit()}} method.

 ShuffleHandler should handle NM restarts
 

 Key: MAPREDUCE-5652
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
  Labels: shuffle

 ShuffleHandler should work across NM restarts and not require re-running 
 map-tasks. On NM restart, the map outputs are cleaned up requiring 
 re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-11-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Status: Patch Available  (was: Open)

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-11-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-trunk.patch

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-11-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-trunk.patch

test failure seems unrelated.

uploading patch that fixes the javac warning (was in a testcase)

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-11-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-branch-1.patch

patch for branch-1

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-11-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3310:
--

Attachment: MAPREDUCE-3310-trunk.patch

reuploading patch for trunk so jenkins do not pickup the branch-1 patch.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5481) Uber job reducers hang waiting to shuffle map outputs

2013-11-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821683#comment-13821683
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5481:
---

LTGM, just a couple of questions/comments:

* LocalContainerLauncher.java (line:352 with patch), do we need to do something 
about it: {{ //relocalize();  // needed only if more than one reducer supported 
(is MAPREDUCE-434 fixed yet?)}}
* LocalContainerLauncher.java, the introduced {{localMapFiles Map}},  from a 
cursory look it does not seem to be accessed from multiple threads, if so it is 
fine. Else we need to use a sync/concurrent map.

 Uber job reducers hang waiting to shuffle map outputs
 -

 Key: MAPREDUCE-5481
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 3.0.0
Reporter: Jason Lowe
Assignee: Xuan Gong
Priority: Blocker
 Attachments: MAPREDUCE-5481.patch, MAPREDUCE-5481.patch, syslog


 TestUberAM has been timing out on trunk for some time now and surefire then 
 fails the build.  I'm not able to reproduce it locally, but the Jenkins 
 builds have been seeing it fairly consistently.  See 
 https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5481) Uber job reducers hang waiting to shuffle map outputs

2013-11-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821952#comment-13821952
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5481:
---

LGTM +1 after jenkins.

 Uber job reducers hang waiting to shuffle map outputs
 -

 Key: MAPREDUCE-5481
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 3.0.0
Reporter: Jason Lowe
Assignee: Xuan Gong
Priority: Blocker
 Attachments: MAPREDUCE-5481-1.patch, MAPREDUCE-5481.patch, 
 MAPREDUCE-5481.patch, syslog


 TestUberAM has been timing out on trunk for some time now and surefire then 
 fails the build.  I'm not able to reproduce it locally, but the Jenkins 
 builds have been seeing it fairly consistently.  See 
 https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5609) Add debug log message when sending job end notification

2013-11-06 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815363#comment-13815363
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5609:
---

+1

 Add debug log message when sending job end notification
 ---

 Key: MAPREDUCE-5609
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5609
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.1
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: MAPREDUCE-5609.patch


 Currently, it's hard to tell if the job end notification is working and if 
 its backed up because you only see log messages if there was an error making 
 the notification.  It would be helpful to add a debug log message when the 
 job end notification is sent.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5609) Add debug log message when sending job end notification

2013-11-06 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5609:
--

   Resolution: Fixed
Fix Version/s: 1.3.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

thanks Robert, committed to branch-1.

 Add debug log message when sending job end notification
 ---

 Key: MAPREDUCE-5609
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5609
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.1
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 1.3.0

 Attachments: MAPREDUCE-5609.patch


 Currently, it's hard to tell if the job end notification is working and if 
 its backed up because you only see log messages if there was an error making 
 the notification.  It would be helpful to add a debug log message when the 
 job end notification is sent.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5457) Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators

2013-10-16 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797086#comment-13797086
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5457:
---

+1 LGTM

 Add a KeyOnlyTextOutputReader to enable streaming to write out text files 
 without separators
 

 Key: MAPREDUCE-5457
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5457
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5457-1.patch, MAPREDUCE-5457-2.patch, 
 MAPREDUCE-5457-3.patch, MAPREDUCE-5457-branch-1-1.patch, 
 MAPREDUCE-5457-branch-1.patch, MAPREDUCE-5457.patch


 MR jobs sometimes want to just output lines of text, not key/value pairs.  
 TextOutputFormat handles this by, if a null value is given, outputting only 
 the key with no separator.  Streaming jobs are unable to take advantage of 
 this, because they can't output null values.  A text output format reader 
 takes each line as a key and outputs NullWritables for values would allow 
 streaming jobs to output lines of text. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-10-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-3310:
-

Assignee: Alejandro Abdelnur

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur

 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5457) Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators

2013-10-10 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792058#comment-13792058
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5457:
---

LGTM, it would be good to have a testcase using streaming.

 Add a KeyOnlyTextOutputReader to enable streaming to write out text files 
 without separators
 

 Key: MAPREDUCE-5457
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5457
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5457-branch-1.patch, MAPREDUCE-5457.patch


 MR jobs sometimes want to just output lines of text, not key/value pairs.  
 TextOutputFormat handles this by, if a null value is given, outputting only 
 the key with no separator.  Streaming jobs are unable to take advantage of 
 this, because they can't output null values.  A text output format reader 
 takes each line as a key and outputs NullWritables for values would allow 
 streaming jobs to output lines of text. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5088) MR Client gets an renewer token exception while Oozie is submitting a job

2013-10-01 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783222#comment-13783222
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5088:
---

This means that we need to have JHS HA, correct?]

 MR Client gets an renewer token exception while Oozie is submitting a job
 -

 Key: MAPREDUCE-5088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Roman Shaposhnik
Assignee: Daryn Sharp
Priority: Blocker
 Fix For: 2.0.4-alpha

 Attachments: HADOOP-9409.patch, HADOOP-9409.patch, 
 MAPREDUCE-5088.patch, MAPREDUCE-5088.patch, MAPREDUCE-5088.txt


 After the fix for HADOOP-9299 I'm now getting the following bizzare exception 
 in Oozie while trying to submit a job. This also seems to be KRB related:
 {noformat}
 2013-03-15 13:34:16,555  WARN ActionStartXCommand:542 - USER[hue] GROUP[-] 
 TOKEN[] APP[MapReduce] JOB[001-130315123130987-oozie-oozi-W] 
 ACTION[001-130315123130987-oozie-oozi-W@Sleep] Error starting action 
 [Sleep]. ErrorType [ERROR], ErrorCode [UninitializedMessageException], 
 Message [UninitializedMessageException: Message missing required fields: 
 renewer]
 org.apache.oozie.action.ActionExecutorException: 
 UninitializedMessageException: Message missing required fields: renewer
   at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401)
   at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:738)
   at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:889)
   at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
   at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
   at org.apache.oozie.command.XCommand.call(XCommand.java:277)
   at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
   at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
   at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
 required fields: renewer
   at 
 com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:605)
   at 
 org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto$Builder.build(SecurityProtos.java:973)
   at 
 org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.mergeLocalToProto(GetDelegationTokenRequestPBImpl.java:84)
   at 
 org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.getProto(GetDelegationTokenRequestPBImpl.java:67)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getDelegationToken(MRClientProtocolPBClientImpl.java:200)
   at 
 org.apache.hadoop.mapred.YARNRunner.getDelegationTokenFromHS(YARNRunner.java:194)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:273)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:581)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:576)
   at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:723)
   ... 10 more
 2013-03-15 13:34:16,555  WARN ActionStartXCommand:542 - USER[hue] GROUP[-] 
 TOKEN[] APP[MapReduce] JOB[001-13031512313
 {noformat}



--
This message was sent by Atlassian

[jira] [Commented] (MAPREDUCE-5544) JobClient#getJob loads job conf twice

2013-10-01 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783230#comment-13783230
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5544:
---

+1, LGTM

 JobClient#getJob loads job conf twice
 -

 Key: MAPREDUCE-5544
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5544
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5544-1.patch, MAPREDUCE-5544.patch


 Calling JobClient#getJob causes the job conf file to be loaded twice, once in 
 the constructor of JobClient.NetworkedJob and once in Cluster#getJob.  We 
 should remove the former.
 MAPREDUCE-5001 was meant to fix a race that was causing problems in Hive 
 tests, but the problem persists because it only fixed one of the places where 
 the job conf file is loaded.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits

2013-09-17 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769854#comment-13769854
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5487:
---

On my first comment, my bad, mistakenly thought JOB_CONF_FILE was 
mapred-site.xml, it is job.xml, the localized job. It is fine then.

+1

 In task processes, JobConf is unnecessarily loaded again in Limits
 --

 Key: MAPREDUCE-5487
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch


 Limits statically loads a JobConf, which incurs costs of reading files from 
 disk and parsing XML.  The contents of this JobConf are identical to the one 
 loaded by YarnChild (before adding job.xml as a resource).  Allowing Limits 
 to initialize with the JobConf loaded in YarnChild would reduce task startup 
 time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-16 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-5379.
---

   Resolution: Fixed
Fix Version/s: 2.1.1-beta
 Hadoop Flags: Reviewed

Thanks Karthik. Committed to trunk, branch-2 and branch-2.1-beta.

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Fix For: 2.1.1-beta

 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits

2013-09-16 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768302#comment-13768302
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5487:
---

Shouldn't {{Limits.init(job)}} be called after adding the mapred config as 
resource?

Personally, I don't like constants that are not 'constants', that seems to be 
the case of these limits. I know this is not being introduced by this patch. I 
would change all code to use the methods and deprecate the constants. I'm OK 
with doing that in another patch though.

 In task processes, JobConf is unnecessarily loaded again in Limits
 --

 Key: MAPREDUCE-5487
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch


 Limits statically loads a JobConf, which incurs costs of reading files from 
 disk and parsing XML.  The contents of this JobConf are identical to the one 
 loaded by YarnChild (before adding job.xml as a resource).  Allowing Limits 
 to initialize with the JobConf loaded in YarnChild would reduce task startup 
 time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767170#comment-13767170
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5379:
---

+1 LGTM

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-29 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Abdelnur updated MAPREDUCE-5483:
--

Resolution: Fixed
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)

Thanks Robert, thanks Chuan Liu. Committed to trunk, branch-2 branch-2.1.

revert MAPREDUCE-5357
-

Key: MAPREDUCE-5483
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: distcp
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
Fix For: 2.1.1-beta

Attachments: MAPREDUCE-5483.patch

MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid
unless you are superuser. if you a chown() to yourself is a NOP, that is why
has not been detected in Hadoop testcases where user is running as itself.
However, in distcp testcases run by Oozie which use test users/groups from
UGI for minicluster it is failing because of this chown() either because the
test user does not exist of because the current use does not have privileges
to do a chown().
We should revert MAPREDUCE-5357. Windows should handle this with some
conditional logic used only when running in Windows.
Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in
2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5484) YarnChild unnecessarily loads job conf twice

2013-08-29 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754010#comment-13754010
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5484:
---

+1 LGTM

 YarnChild unnecessarily loads job conf twice
 

 Key: MAPREDUCE-5484
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5484
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: perfomance
 Attachments: MAPREDUCE-5484-1.patch, MAPREDUCE-5484.patch


 In MR task processes, a JobConf is instantiated with the same job.xml twice, 
 once at the beginning of main() and once in configureTask.  IIUC, the second 
 instantiation is not necessary.  These take time reading from disk and 
 parsing XML.
 Removing the second instantiation shaved a second off the average map task 
 time in a 1,000-map sleep job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-5362) clean up POM dependencies

2013-08-27 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-5362:
-

Assignee: Roman Shaposhnik

all yours, thx

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5483:
-

 Summary: revert MAPREDUCE-5357
 Key: MAPREDUCE-5483
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5483
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.1.1-beta


MAPREDUCE-5357 does a fileystem chown() operation. chown() is not valid unless 
you are superuser. if you a chown() to yourself is a NOP, that is why has not 
been detected in Hadoop testcases where user is running as itself. However, in 
distcp testcases run by Oozie which use test users/groups from UGI for 
minicluster it is failing because of this chown() either because the test user 
does not exist of because the current use does not have privileges to do a 
chown().

We should revert MAPREDUCE-5357. Windows should handle this with some 
conditional logic used only when running in Windows.

Opening a new JIRA and not reverting directly because MAPREDUCE-5357 went in 
2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5357) Job staging directory owner checking could fail on Windows

2013-08-27 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751547#comment-13751547
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5357:
---

FYI, opened MAPREDUCE-5483 to revert this JIRA.

 Job staging directory owner checking could fail on Windows
 --

 Key: MAPREDUCE-5357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 3.0.0, 2.1.0-beta

 Attachments: MAPREDUCE-5357-trunk.patch


 In {{JobSubmissionFiles.getStagingDir()}}, we have following code that will 
 throw exception if the directory owner is not the current user.
 {code:java}
   String owner = fsStatus.getOwner();
   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
  throw new IOException(The ownership on the staging directory  +
   stagingArea +  is not as expected.  +
   It is owned by  + owner + . The directory must  +
   be owned by the submitter  + currentUser +  or  +
   by  + realUser);
   }
 {code}
 This check will fail on Windows when the underlying file system is 
 LocalFileSystem. Because on Windows, the default file or directory owner 
 could be Administrators group if the user belongs to Administrators group.
 Quite a few MR unit tests that runs MR mini cluster with localFs as 
 underlying file system fail because of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751558#comment-13751558
]

Alejandro Abdelnur commented on MAPREDUCE-5483:
---

I guess we could do a check if the platform is windows to do the chown() but
the fix was because testcases failing on windows when running them as admin. it
seems fishy to me that Windows will fail silently chown(). Regardless, either
we guard this code to run only on Windows or we revert it. I'd prefer reverting
it.

revert MAPREDUCE-5357
-

[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751578#comment-13751578
]

Alejandro Abdelnur commented on MAPREDUCE-5483:
---

if you run builds in the same directory as different users you'll run into
permission issues deleting files from previous run unless the user running the
second time is a superuser. That seems a wrong thing to do.

revert MAPREDUCE-5357
-

Attachments: MAPREDUCE-5483.patch

[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751580#comment-13751580
]

Alejandro Abdelnur commented on MAPREDUCE-5483:
---

if we revert this patch you don't do a chown() in a dir you created.

revert MAPREDUCE-5357
-

Attachments: MAPREDUCE-5483.patch

[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751603#comment-13751603
]

Alejandro Abdelnur commented on MAPREDUCE-5483:
---

+1 from my side. [~chuanliu], are you OK with the revert?

revert MAPREDUCE-5357
-

Attachments: MAPREDUCE-5483.patch

[jira] [Commented] (MAPREDUCE-5483) revert MAPREDUCE-5357

2013-08-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751681#comment-13751681
]

Alejandro Abdelnur commented on MAPREDUCE-5483:
---

UGI and minicluster have support for adding test users which do not map to OS
users. when using such test users things blow up in the local file system.
Before MAPREDUCE-5357 (without the chown) thing were working fine in such
scenarios. MAPREDUCE-5357 introduced a regression.

I'm planning to commit the current tomorrow. If you want to do a special
handling for Windows (which I would not recommend) please upload a patch. The
patch should have the effect of a 'revert' for non Windows platforms.

revert MAPREDUCE-5357
-

Attachments: MAPREDUCE-5483.patch

[jira] [Created] (MAPREDUCE-5473) JT webservices use a static SimpleDateFormat, SImpleDateFormat is not threadsafe

2013-08-21 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5473:
-

 Summary: JT webservices use a static SimpleDateFormat, 
SImpleDateFormat is not threadsafe
 Key: MAPREDUCE-5473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5473
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Alejandro Abdelnur


MAPREDUCE-4837 is doing:

{code}
%!static SimpleDateFormat dateFormat = new SimpleDateFormat(
  d-MMM- HH:mm:ss);
{code}

But SimpleDateFormat is not thread safe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-14 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739970#comment-13739970
]

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], I'm glad you are glad, *smile*.

Yes, we agree we need to leave SLOT_MILLIS because Hadoop 0.23 users are
relying on it.

We also agreed before that we should remove 'minimum resource capability' from
the protocol and the API because it is a scheduler implementation internal
thing that should not be exposed to the users.

The current code relies on using 'minimum resource capability' configuration
property which is internal. We've also agreed on doing that until this JIRA is
fixed.

The proposed solution is a tweak to the current code just using a different
configuration property, nothing else.

Adding 'minimum resource capability' back to the protocol and API to support
deprecated functionality that we all agree it should go away does not seem
right.

I prefer [~jlowe] suggestion to have a new ( deprecated) configuration
property that users upgrading from 0.23 and wanting to preserve the SLOT_MILLIS
counter information can use (and if they don't -default setting- SLOT_MILLIS is
always zero).

Also, doing the constant replacement is much simple that reintroducing the
protocol and API minimum field.

Lets move forward with this.

Replace SLOTS_MILLIS counters with MEM_MILLIS
-

Key: MAPREDUCE-5311
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
Fix For: 2.1.0-beta

Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch,
MAPREDUCE-5311.patch

Per discussion in MAPREDUCE-5310 and comments in the code we should remove
all the related logic and just leave the counter constant for backwards
compatibility and deprecate the counter constants.

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740074#comment-13740074
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~bikassaha], you are on the spot. The only difference being proposed is that 
instead of using the scheduler MIN property directly, to define a new one for 
this particular use case in the MRAM namepsace (and deprecate it). The reason 
for doing this is that the exiting scheduler MIN propery would go away per 
YARN-1004.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740397#comment-13740397
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], there are other 4 committers that are OK with the idea of a 
separate config to enable SLOT_MILLIS because it is a special usecase for 0.23 
users. The changes are must less disruptive and (IMO) adequate given the 
usecase.

Can you please reconsider your -1?

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740646#comment-13740646
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

Given than this is a legacy thing coming from Hadoop 1, I don't think we should 
use at all YARN constants properties.

Why we don't use the Hadoop 1 JT properties for the same, in the 
mapred-site.xml documenting how they have to be set to the  MIN of the 
scheduler for SLOT_MILLIS counter to kick in? To me this seem a much more 
correct way of doing it.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-13 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738516#comment-13738516
]

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], it seems we are talking pass each other here.

* 1. SLOT_MILLIS does make sense in YARN. yes/no?
* 2. We need to redefine what SLOT_MILLIS means/reports in YARN. yes/no?

Are we in agreement that the answers to these questions is #1 NO and #2 YES.

If we are in agreement, then we have to see how to address this in the least
disruptive way.

Sandy's latest proposal suggests we do the following:

* Introduce the concept of CONTAINER_MILLIS (regardless of the container size)
* Deprecate SLOT_MILLIS and map it to report CONTAINER_MILLIS

And we could later augment this with additional counters:

* Introduce the concept of MEMORY_MILLIS
* Introduce the concept of CPU_MILLIS

Replace SLOTS_MILLIS counters with MEM_MILLIS
-

Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch,
MAPREDUCE-5311.patch

Per discussion in MAPREDUCE-5310 and comments in the code we should remove
all the related logic and just leave the counter constant for backwards
compatibility and deprecate the counter constants.

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-13 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739043#comment-13739043
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~jlowe], thanks for jumping in, I was no aware you guys have in-house tools 
using this stuff with the current reported SLOT_MILLIS. With that in mind, and 
along the lines of what Jason proposes would the following satisfy all parties?

This JIRA would then be repurposed to:

* Introduce and deprecate a {{MRConf.SLOT_MILLIS_MINIMUM_ALLOCATION_MB}} with 
no default
* If set, use the {{MRConf.SLOT_MILLIS_MINIMUM_ALLOCATION_MB}} value to 
compute, using today's logic, the SLOT_MILLIS counter values.
* If no set, SLOT_MILLIS counter should report 0.

This means than anybody relying on current SLOT_MILLIS reporting can continue 
getting it until we decide to trash it (a few versions down the road).

A different JIRA would introduce MEM_MILLIS and CPU_MILLIS which have an 
accurate meaning in Yarn's world.

YARN-1004 is then unblocked.

I believe this address the problem without breaking backwards compatibility as 
[~acmurthy] asked.

Arun, Jason, [~sandyr], are you OK with this approach?



 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-09 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735011#comment-13735011
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], I don't see how this will break BC, it is not being proposed 
removing the counter, but either making it zero or as [~sandyr] suggested 
introduced CONTAINER_MILLIS_MAP and map SLOT_MILLIS_MAP to it (approach that 
will make more sense than the current value). I don't want to punt because you 
are blocking YARN-1004 because of this one. YARN-1004 should go in 2.1.0-beta. 
Please ping me if you want to chat offline over the phone if you think will be 
easier to discussed it. 

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-09 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5311:
--

Fix Version/s: 2.1.0-beta

[~acmurthy], I guess you removing it from 2.1.0-beta and my last comment had a 
race condition, making it blocker for 2.1.0-beta again.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731067#comment-13731067
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], unless I got things wrong, we agreed to keep slot-millis around 
until we have memory-millis. And the latest patch here is doing that. 

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5311:
--

 Priority: Blocker  (was: Major)
Fix Version/s: 2.1.0-beta

We need to take care of this for 2.1.0, making it a blocker.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5311) Remove slot millis computation logic and deprecate counter constants

2013-08-02 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728327#comment-13728327
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], per [~jlowe]'s comment above so all that's left is to clarify 
what will happen to slot-millis once mem-millis shows up. he does not seem 
unhappy.

I think the problem we have at hand here is that there is a legacy metrics, 
slots-millis, which does not make sense in YARN regardless what it returns 0 or 
mem-millis. Anybody relying on this value coming from Hadoop 1 will get 
something completely different to what was getting when running in Hadoop 1.

This means that if we make it disappear or we leave around returning the wrong 
value (a funny mem-millis based on MIN config now), current users relying on it 
will have to adjust how they see/process this value.

Because of that, I would say, we just bite the problem, and we make the 
slot-millis counter to return 0 (or better -1), deprecate the constants, print 
a warning when somebody uses this constant indicating the user to user look for 
memory-millis (and eventually cpu-millis).


 Remove slot millis computation logic and deprecate counter constants
 

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf

2013-08-01 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726830#comment-13726830
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5379:
---

I've played around and got Daryn's patch to work. After running the patch by 
Andrew Wang (who is doing HDFS-4680) he brought up a a concern with the client 
driven tracking approach, a client can set a rogue trackingId. But with the 
sequenceId approach what is in the HDFS audit can fully trusted and tracked to 
a user.

One concern Daryn mentioned above with the sequenceId approach was, and also 
told me offline, the MR client decoding the token identifier, this could break 
things when moving token encoding from writable to protobuff.

To address this, instead of the MR client decoding the token identifier it 
would simply do a hash of its byte[] representation without decoding it.

In addition, the MR client should have an option to switch ON/OFF(default) the 
DT hash generation/injection in the jobconf.


 Include FS delegation token ID in job conf
 --

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch


 Making a job's FS delegation token ID accessible will allow external services 
 to associate it with the file system operations it performs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5426) MRAM fails to register to RM, AMRM token seems missing

2013-07-27 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created MAPREDUCE-5426:
-

 Summary: MRAM fails to register to RM, AMRM token seems missing
 Key: MAPREDUCE-5426
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5426
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta


trying to run the pi example in an unsecure pseudo cluster the job fails. 

It seems the AMRM token is MIA.

The AM syslog have the following:

{code}
2013-07-27 14:17:23,703 ERROR [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while 
registering
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:176)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:95)
at com.sun.proxy.$Proxy29.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:147)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:107)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:213)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:789)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1369)
at org.apache.hadoop.ipc.Client.call(Client.java:1322)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
... 22 more
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces

2013-07-26 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-4366.
---

   Resolution: Fixed
Fix Version/s: 1.3.0
 Hadoop Flags: Reviewed

Thanks Sandy. Committed to branch-1.

 mapred metrics shows negative count of waiting maps and reduces
 ---

 Key: MAPREDUCE-4366
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Sandy Ryza
 Fix For: 1.3.0

 Attachments: MAPREDUCE-4366-branch-1-1.patch, 
 MAPREDUCE-4366-branch-1.patch


 Negative waiting_maps and waiting_reduces count is observed in the mapred 
 metrics.  MAPREDUCE-1238 partially fixed this but it appears there is still 
 issues as we are seeing it, but not as bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf

2013-07-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718591#comment-13718591
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5379:
---

+1 from my side, IMO [~daryn] concerns have been addressed, [~daryn]?

 Include FS delegation token ID in job conf
 --

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379.patch


 Making a job's FS delegation token ID accessible will allow external services 
 to associate it with the file system operations it performs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5379) Include FS delegation token ID in job conf

2013-07-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718592#comment-13718592
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5379:
---

[~daryn], happy to have a call if you want to quickly this discuss this, then 
I'll summarize the offline discussions here.

 Include FS delegation token ID in job conf
 --

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379.patch


 Making a job's FS delegation token ID accessible will allow external services 
 to associate it with the file system operations it performs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5288) ResourceEstimator#getEstimatedTotalMapOutputSize suffers from divide by zero issues

2013-07-24 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5288:
--

   Resolution: Fixed
Fix Version/s: 1.3.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Karthik. Thanks Harsh for reviewing it. Committed to branch-1.

 ResourceEstimator#getEstimatedTotalMapOutputSize suffers from divide by zero 
 issues
 ---

 Key: MAPREDUCE-5288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Harsh J
Assignee: Karthik Kambatla
 Fix For: 1.3.0

 Attachments: mr-5288-1.patch


 The computation in the above mentioned class-method is below:
 {code}
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
 {code}
 Given 
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double), 
 its possible that the returned estimate could be Long.MAX_VALUE if 
 completedMapsInputSize is determined to be zero.
 This can be proven with a simple code snippet:
 {code}
 class Foo {
 public static void main(String... args) {
 long inputSize = 600L + 2;
 long estimate = Math.round(((double)inputSize *
   1L * 2.0)/0L);
 System.out.println(estimate);
 }
 }
 {code}
 The above conveniently prints out: {{9223372036854775807}}, which is 
 Long.MAX_VALUE (or 8 Exbibytes per MapReduce).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 8 9 >

1 - 100 of 832 matches

Mail list logo