[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Avner BenHanoch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avner BenHanoch updated MAPREDUCE-4049:
---

Description: 
Support generic shuffle service as set of two plugins: ShuffleProvider  
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on shuffle 
plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or 
Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA 
shuffle, the plugin can also utilize a suitable merge approach during the 
intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
dependency of NodeManager with a specific version of mapreduce shuffle 
(currently targeted to 0.24.0).

References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
from Auburn University with others, 
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins 
(currently, based on 1.0 branch)
# I am providing link for downloading UDA - Mellanox's open source plugin that 
implements generic shuffle service using RDMA and levitated merge.  Note: At 
this phase, the code is in C++ through JNI and you should consider it as beta 
only.  Still, it can serve anyone that wants to implement or contribute to 
levitated merge. (Please be advised that levitated merge is mostly suit in very 
fast networks) - 
[http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

  was:
Support generic shuffle service as set of two plugins: ShuffleProvider  
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on shuffle 
plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or 
Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA 
shuffle, the plugin can also utilize a suitable merge approach during the 
intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
dependency of NodeManager with a specific version of mapreduce shuffle 
(currently targeted to 0.24.0).

References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
from Auburn University with others, 
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins 
(currently, based on 1.0 branch)


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504461#comment-13504461
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Hi Laxman,

Thanks for your comment and sorry for my late response.

I just posted a link for downloading the source code of Mellanox plugin that 
implements generic shuffle using RDMA and levitated merge.

You are warmly welcomed to contribute to push the algorithms of this plugin to 
the core of vanilla Hadoop, as well as to help accepting my straight forward 
patch in this JIRA issue.
Avner

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4762) repair test org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal

2012-11-27 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated MAPREDUCE-4762:
--

Attachment: MAPREDUCE-4762--b.patch
MAPREDUCE-4762-branch-0.23--b.patch

Hi, Robert,
the attached patches MAPREDUCE-4762-branch-0.23--b.patch and 
MAPREDUCE-4762--b.patch implement your suggestion. 
Patch MAPREDUCE-4762--b.patch targeted to branches trunk and branch-2. 

 repair test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal
 -

 Key: MAPREDUCE-4762
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4762
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4762--b.patch, 
 MAPREDUCE-4762-branch-0.23--b.patch, MAPREDUCE-4762-trunk.patch


 The test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal is 
 @Ignor-ed. 
 Due to that several classes in package 
 org.apache.hadoop.mapreduce.security.token have zero unit-test coverage.
 The problem is that the test assumed that class 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.Renewer 
 is used as a custom implementation of the 
 org.apache.hadoop.security.token.TokenRenewer service, but that did not 
 happen, because this custom service implementation was not registered. 
 We solved this problem by using special classloader that is invoked to find 
 the resource META-INF/services/org.apache.hadoop.security.token.TokenRenewer 
 , and supplies some custom content for it. This way the custom service 
 implementation gets instantiated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504502#comment-13504502
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:



_Alejandro,_

With all due respect, I think that something in your behavior is inappropriate:
 * You were never involved in this issue; still you gave yourself the liberty 
to make it a sub issue of your supported MAPREDUCE-2454 issue, without 
consulting anyone.
 * This is especially inappropriate since MAPREDUCE-2454 is disputable and has 
its acceptance problems regardless of my issue.  Hence, its acceptance problems 
will affect my issue.
 * Your justification *As all this JIRAs are small, I think we'll be able to 
move fast with all of them.* is inappropriate since you actually created a 
linkage that will surely postpone my issue instead of leaving each issue to 
progress at its own pace!
 * It is not the first time that the persons behind MAPREDUCE-2454 try to 
disturb this JIRA issue.

Apparently, I don't have the privileges to break this sub task linkage; 
hence, I am asking that you or someone else will do it.

I am welcoming any comment coming from a professional place with the simple 
target of making Hadoop better. Having that said,  I feel that the way you 
blitzed my patch with any possible patty comment, sometime with disputable 
claims, just before the patch is about to be accepted – is unfair, 
unprofessional and unfriendly. Especially considering your complete silence 
since this JIRA issue has started.

I am not sure that commenting in a blitz way will increase the quality of 
hadoop.  For example:

{quote}
Checking for shuffleConsumerPlugin != null before closing it seems redundant, 
you would have never got there if shufflePlugin is NULL.
{quote}
This is your mistake (I'll reach there in case isLocal == true).  *There is no 
option to remove the nullity check!*

{quote}
Visibility annotations for the ShuffleConsumerPlugin, ShuffleContext, should 
be Unstable
{quote}
I think it is inappropriate to declare plugin interface as Unstable, since it 
must stay stable for 3rd party vendors.

--- --- --- ---

Personally, I have no problem to implement all the rest of your comments. It 
should be very easy for me.  Still, I am raising few points for consideration 
regarding your following comments:

{quote}
The Shuffle class should be renamed to DefaultShuffle.
The ShuffleConsumerPlugin should be renamed to Shuffle.
{quote}
I chose the term 'ShuffleConsumerPlugin' and not something like 'Shuffle', 
because it clarifies that we are in a *plugin* of *ShuffleConsumer*, rather 
than a *builtin*  *ShuffleProvider/ShuffleHandler*.   Also, I didn't take the 
liberty to rename core classes of Hadoop.  

{quote}
ShuffleConsumerPlugin, getShuffleConsumerPlugin() method is not required, 
instead use the ReflectionUtils directly in the ReducerTask class.
{quote}
Here, I only followed existing convention of Hadoop as shown in 
ResourceCalculatorPlugin.getResourceCalculatorPlugin().  Personally, I'll be 
glad to follow your advice, and even to go one step further and make 
ShuffleConsumerPlugin an interface instead of AbstractClass.

{quote}
use 'mapreduce.job.reduce.shuffle.class' to be consistent with MAPREDUCE-2454.
{quote}
Here I chose 'mapreduce.shuffle…', since I think it is consistent with the 
current convention in hadoop-3 configuration.

--- --- --- ---

I can tell you that Arun  Todd didn't make it easy for me with their requests 
from this patch so far. Still, I understand, respect and accept all their 
comments.  I am sure that everyone involved only want the best for Hadoop.  
I suggest we hear Arun's consideration and move forward with the patch in the 
best professional way.

_*Arun,*_
I think you are very familiar with both Hadoop/MapReduce and this JIRA issue 
since its inception. You are also well familiar and involved with 
MAPREDUCE-2454.  It is also safe to say you know Alejandro and Asokan better 
than you know me.  I believe everyone involved will agree that your sole 
interest is Hadoop's quality.  *I am asking you and everyone else to help 
progressing here.*

Avner


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set 

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504521#comment-13504521
 ] 

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Yarn-trunk #49 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/49/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby) 
(Revision 1413739)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413739
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java


 repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
 

 Key: MAPREDUCE-4764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch


 the test is @Ignore-ed, and fails being enabled.
 Suggested to repair it to fill the coverage gap.
 Problems fixed in the test: 
 (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties 
 must be correctly set in the configuration to correctly enable the security 
 in the way this test implies. 
 (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not 
 passed into the Job configuration -- it is intentionally deleted from there. 
 So, we pass the binary file name in another dedicated property. 
 (3) The test was using deprecated cluster classes. All them are updated to 
 the modern analogs.
 (4) The delegation token found in the job context is now correctly compared 
 to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4762) repair test org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal

2012-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504537#comment-13504537
 ] 

Hadoop QA commented on MAPREDUCE-4762:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12554983/MAPREDUCE-4762--b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3071//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3071//console

This message is automatically generated.

 repair test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal
 -

 Key: MAPREDUCE-4762
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4762
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4762--b.patch, 
 MAPREDUCE-4762-branch-0.23--b.patch, MAPREDUCE-4762-trunk.patch


 The test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal is 
 @Ignor-ed. 
 Due to that several classes in package 
 org.apache.hadoop.mapreduce.security.token have zero unit-test coverage.
 The problem is that the test assumed that class 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.Renewer 
 is used as a custom implementation of the 
 org.apache.hadoop.security.token.TokenRenewer service, but that did not 
 happen, because this custom service implementation was not registered. 
 We solved this problem by using special classloader that is invoked to find 
 the resource META-INF/services/org.apache.hadoop.security.token.TokenRenewer 
 , and supplies some custom content for it. This way the custom service 
 implementation gets instantiated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504558#comment-13504558
 ] 

Laxman commented on MAPREDUCE-4049:
---

bq. You are warmly welcomed to contribute to push the algorithms of this plugin 
to the core of vanilla Hadoop

Thank you Avner. I wish to see this as part of hadoop.
I'm not able to build UDA you have provided as per BUILD.README provided in the 
downloaded bundle. SVN repository provided is not accessible/resolvable.

https://sirius.voltaire.com/repos/enterprise/uda/trunk

bq. as well as to help accepting my straight forward patch in this JIRA issue.
I will personally request few of my friends (Hadoop contributors) to review 
this jira.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504560#comment-13504560
 ] 

Laxman commented on MAPREDUCE-4049:
---

I'm trying to build as per the README available here 
(http://mellanox.com/downloads/UDA/UDA3.0_Release.tar.gz).

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504581#comment-13504581
 ] 

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Hdfs-0.23-Build #448 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/448/])
svn merge -c 1413739 FIXES: MAPREDUCE-4764. repair TestBinaryTokenFile 
(Ivan A. Veselovsky via bobby) (Revision 1413742)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413742
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java


 repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
 

 Key: MAPREDUCE-4764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch


 the test is @Ignore-ed, and fails being enabled.
 Suggested to repair it to fill the coverage gap.
 Problems fixed in the test: 
 (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties 
 must be correctly set in the configuration to correctly enable the security 
 in the way this test implies. 
 (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not 
 passed into the Job configuration -- it is intentionally deleted from there. 
 So, we pass the binary file name in another dedicated property. 
 (3) The test was using deprecated cluster classes. All them are updated to 
 the modern analogs.
 (4) The delegation token found in the job context is now correctly compared 
 to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504589#comment-13504589
 ] 

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Hdfs-trunk #1239 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1239/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby) 
(Revision 1413739)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413739
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java


 repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
 

 Key: MAPREDUCE-4764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch


 the test is @Ignore-ed, and fails being enabled.
 Suggested to repair it to fill the coverage gap.
 Problems fixed in the test: 
 (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties 
 must be correctly set in the configuration to correctly enable the security 
 in the way this test implies. 
 (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not 
 passed into the Job configuration -- it is intentionally deleted from there. 
 So, we pass the binary file name in another dedicated property. 
 (3) The test was using deprecated cluster classes. All them are updated to 
 the modern analogs.
 (4) The delegation token found in the job context is now correctly compared 
 to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504606#comment-13504606
 ] 

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Mapreduce-trunk #1270 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1270/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby) 
(Revision 1413739)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413739
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java


 repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
 

 Key: MAPREDUCE-4764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch


 the test is @Ignore-ed, and fails being enabled.
 Suggested to repair it to fill the coverage gap.
 Problems fixed in the test: 
 (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties 
 must be correctly set in the configuration to correctly enable the security 
 in the way this test implies. 
 (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not 
 passed into the Job configuration -- it is intentionally deleted from there. 
 So, we pass the binary file name in another dedicated property. 
 (3) The test was using deprecated cluster classes. All them are updated to 
 the modern analogs.
 (4) The delegation token found in the job context is now correctly compared 
 to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504624#comment-13504624
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Hi Laxman,

You are referring to an internal document (there is no external document yet 
:)).
The svn is only for downloading internally clean sources for releasing new 
version.  However, you already got the sources and you don't need it.

In fast, I think you should use: 
 # src/premake.sh 
 # build/makerpm.sh 

Also, in fast, Please expect compilation dependency:
 * In the C++, on librdmacm-devel
 * In the java, you'll need to copy the hadoop jars, that are used by the 
plugin, into the plugin's directory (see them according to CLASSPATH in the 
makefile at the plugin's directory)

Before you go with the java side, you may choose to edit makerpm.sh and comment 
out hadoop flavors that you don't care about. 

Please be aware that you are the 1st one that tries to build the sources 
outside Mellanox.
Also, I am not sure this is the place and way to get support for Mellanox 
products.


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504633#comment-13504633
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Hi Avner,

I respectfully disagree with your opinion that my behavior is inappropriate. 

First of all, it is not my intention to slow you this JIRA down, but to make 
sure it is consistent with the related work in MAPREDUCE-2454 (you can see that 
in my comments). If that requires a couple of extra days, is is a small price 
to pay.

As an Apache Hadoop developer is my responsibility to review and provide 
feedback on work posted by other developers, my usual triggers are area of 
knowledge, related work and area of interest. 

This JIRA is tightly related to MAPREDUCE-2454, there is not dispute on that. 
Thus it should stay as a subtask of it.

MAPREDUCE-2454 is not disputable, as it has been commented in it JIRA, it is 
almost ready, it was matter of breaking it up and doing an fast interactive 
review of its parts. As far as I can tell, this is already happening there. 

Now going to your comments on my review:

* Yes the *shuffleConsumerPlugin != null*, you are right, I've  noticed that 
after I've posted my comments, so you can disregard that done.

* On the marking the ShuffleConsumerPlugin, ShuffleContext as *unstable*, it is 
not appropriate, Hadoop wants to keep the right of modifying these APIs in the 
future, if hte need arises. You can also see this, no only in MAPREDUCE-2454, 
but in several places where Hadoop provides pluggability (ie 
ResourceManagement, authentication).

* On making the ShuffleConsumerPlugin and interface, that is a good idea, it 
will align things with the other sub-tasks.

Looking forward to see the updated patch.

Cheers.



 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504635#comment-13504635
 ] 

Jason Lowe commented on MAPREDUCE-4813:
---

MAPREDUCE-4815 only addresses FileOutputCommitter and friends, but the 
committer is arbitrary user code.  It could be doing all sorts of things 
including connecting to databases, etc.  So I still think we need this, 
although the priority of it is reduced given how many things are built from 
FileOutputCommitter.

 AM timing out during job commit
 ---

 Key: MAPREDUCE-4813
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4813.patch


 The AM calls the output committer's {{commitJob}} method synchronously during 
 JobImpl state transitions, which means the JobImpl write lock is held the 
 entire time the job is being committed.  Holding the write lock prevents the 
 RM allocator thread from heartbeating to the RM.  Therefore if committing the 
 job takes too long (e.g.: the job has tons of files to commit and/or the 
 namenode is bogged down) then the AM appears to be unresponsive to the RM and 
 the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504646#comment-13504646
 ] 

Jason Lowe commented on MAPREDUCE-4819:
---

bq. Maybe final client notification should be the last thing after all post 
processing is done.

No, moving the client notification later just creates a different set of 
problems, like the client never being notified *at all* because the AM crashes 
after unregistering with the RM but before it notifies the client.  The RM 
won't restart the app because it unregistered successfully, but the client is 
never notified.

bq. In general it seems like we need to come up with a set of markers that 
previous AM's leave behind that can tell the next retry if the previous one 
failed/succeeded and so the current AM should exit or continue to run.

Exactly, and the AM is already doing this in the job history file which is how 
it helps supports recovery.  We should extend this so that even if the output 
committer doesn't support recovery the AM will check for markers in the job 
history file and prevent the job from executing tasks and committing output if 
final job status has been determined by previous attempts.  That way we prevent 
the AM from re-committing job output or changing the final job status after 
notifying the client.  We just need to make sure those markers are flushed to 
persistent store and located properly by future AM attempts before attempting 
to notify the client.  If subsequent attempts see the final job status marker 
then they should skip straight to the client notification process instead of 
running tasks.


 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4822) Unnessisary conversions in History Events

2012-11-27 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4822:
--

 Summary: Unnessisary conversions in History Events
 Key: MAPREDUCE-4822
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Priority: Trivial


There are a number of conversions in the Job History Event classes that are 
totally unnecessary.  It appears that they were originally used to convert from 
the internal avro format, but now many of them do not pull the values from the 
avro they store them internally.

For example:

{code:title=TaskAttemptFinishedEvent.java}
  /** Get the task type */
  public TaskType getTaskType() {
return TaskType.valueOf(taskType.toString());
  }
{code}

The code currently is taking an enum, converting it to a string and then asking 
the same enum to convert it back to an enum.  If java work properly this should 
be a noop and a reference to the original taskType should be returned.

There are several places that a string is having toString called on it, and 
since strings are immutable it returns a reference to itself.

The various ids are not immutable and probably should not be changed at this 
point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4823) NPE in jobhistory.jsp

2012-11-27 Thread Steve Loughran (JIRA)
Steve Loughran created MAPREDUCE-4823:
-

 Summary: NPE in jobhistory.jsp
 Key: MAPREDUCE-4823
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4823
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.3
 Environment: Running on a JT which had a bit of confusion w.r.t its 
hostname (two IP addresses in /etc/hosts for the same hostname)
Reporter: Steve Loughran
Priority: Minor


asking for the job history page resulted in a stack trace instead of (an empty) 
job history

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Tom White (JIRA)
Tom White created MAPREDUCE-4824:


 Summary: Provide a mechanism for jobs to indicate they should not 
be recovered on restart
 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White


Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
however the approach there is not applicable for MR1, since even if we only use 
the job-level part of the patch and add a isRecoverySupported method to 
OutputCommitter, there is no way to use that information from the JT (which 
initiates recovery), since the JT does not instantiate OutputCommitters - and 
it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
calls the method.)

Instead, we can add a MR configuration property to say that a job is not 
recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4823) NPE in jobhistory.jsp

2012-11-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504699#comment-13504699
 ] 

Steve Loughran commented on MAPREDUCE-4823:
---

stack trace -which bears no relation to where in the JSP page the actual NPE 
was triggered. The generated java pages would show it.

{code}
java.lang.NullPointerException
at 
org.apache.hadoop.mapred.jobhistoryhome_jsp._jspService(jobhistoryhome_jsp.java:151)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:814)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}

 NPE in jobhistory.jsp
 -

 Key: MAPREDUCE-4823
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4823
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.3
 Environment: Running on a JT which had a bit of confusion w.r.t its 
 hostname (two IP addresses in /etc/hosts for the same hostname)
Reporter: Steve Loughran
Priority: Minor

 asking for the job history page resulted in a stack trace instead of (an 
 empty) job history

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4824:
-

Attachment: MAPREDUCE-4824.patch

Here's a patch that implements this idea. Jobs that shouldn't be recovered 
should set mapred.job.restart.recover to false.

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504758#comment-13504758
 ] 

Koji Noguchi commented on MAPREDUCE-4819:
-

bq. like the client never being notified at all because the AM crashes after 
unregistering with the RM but before it notifies the client.

As long as client eventually fail, that's not a problem.

Critical problem we have here is false-positive from the client's perspective.
Client is getting 'success' but output is incomplete or corrupt(due to retried 
application/job (over)writing to the same target path.)

If we can have AM and RM to agree on the job status before telling the client, 
I think that would work.  There could be a corner case when AM and RM say the 
job was successful but client thinks it failed. This false-negative is much 
better than false-positive issue we have now.  Even in 0.20, we had cases when 
JobTracker reports job was successful but client thinks it failed due to 
communication failure to the JobTracker.  This is fine to happen and we should 
let the client handle the recovery-or-retry.


bq. In general it seems like we need to come up with a set of markers that 
previous AM's leave behind

I don't want the correctness of the job to depend on the marker on hdfs.




 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504767#comment-13504767
 ] 

Koji Noguchi commented on MAPREDUCE-4819:
-

bq. I don't want the correctness of the job to depend on the marker on hdfs.

I meant, hdfs on user space like outputpath.  If this is stored elsewhere where 
user cannot access, I have no problem. 

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-27 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504777#comment-13504777
 ] 

Thomas Graves commented on MAPREDUCE-4817:
--

When you say knock off the ping thread I assume you really mean just the ping 
timeout check since the task progress happens in the same thread?

So the ping serves multiple purposes.  Currently it notifies the AM that the 
task has pinged in and is still running.  This could be useful even with 
taskTimeout since the taskTimeout could be turned off (set to 0) and we would 
never know if that task got hung.  Second, the task uses it to check to see if 
the AM is still alive.  If it doesn't return true, the task is supposed to 
exit.  1.X also had the ping check, but it went to the taskTracker and the 
tasktracker validated that the parent Task of the ping checker thread was still 
there.

Now with 0.23 the nodemanager is watching the processes and talking back to the 
RM to let it know that the AM died and if it died it kills the other tasks, but 
if the entire nodemanager goes down then the task doesn't know the AM went 
away.  If the task isn't sending progress, and the task timeout is set to 0, 
and this is the last AM retry it could hang around forever.  

The odds of that seem pretty small and I guess if we aren't worried about the 
first happening, the second probably isn't that interesting either. But we 
could also just remove the ping timeout check in the TaskHeartBeatHandler.
What exactly are you proposing?

 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical

 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4821) Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4

2012-11-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504776#comment-13504776
 ] 

Steve Loughran commented on MAPREDUCE-4821:
---

is there a JUnit 3 jar in your Ant classpath? There has to be a junit4 one else 
the test case won't compile -I suspect your ant installation has a junit jar 
that's being picked up first at test run time.

{{ant -diagnostics}} will show this. If it's there, delete it and see what 
happens when the original test is rerun.

 Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4
 

 Key: MAPREDUCE-4821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.0.3, 1.0.4
 Environment: RHEL 6.3 on x86
Reporter: Amir Sanjar
 Fix For: 1.0.3, 1.1.1

 Attachments: MAPREDUCE-4821-branch1.patch, 
 MAPREDUCE-4821-release-1.0.3.patch


 Problem:
 JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
 Migrate the testcase to JUnit4, including:
 * Remove extends TestCase
 * Remove import junit.framework.TestCase;
 * Add import org.junit.*; 
 * Use appropriate annotations such as @After, @Before, @Test.
 uploading a patch shortly 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504784#comment-13504784
 ] 

Robert Joseph Evans commented on MAPREDUCE-4819:


We are informing several different actors of success/failure in many 
different ways.

# _SUCCESS file being written to HDFS by the output committer as part of 
commitJob()
# job end notification by hitting an http server
# client being informed through RPC
# history server being informed by placing the log in a directory it can see
# resource manager being informed that the application is done

Some of these are much more important to report then others, but either way we 
still have at a minimum two different things that need to be tied together the 
commitJob and informing the RM not to run us again.  Rearranging the order of 
them will not fix the fact that after commitJob() finishes there is the 
possibility that something will fail but must not fail the job.  We really need 
to have a two phase commit in the job history file. 

I am about to commit the job output.
commitJob()
I finished committing the job output successfully. 

Without this there will always be the possibility that commitJob will be called 
twice, which would result in changes to the output directory. I would argue too 
that some of these are important enough that we consider reporting them twice 
and updating the listener to handle double reporting.  Like informing the 
history server about the job finishing.  Others it is not so critical, like job 
end notification or client RPC.

Koji,

I get that we want to reduce the risk of a user shooting themselves in the 
foot, but the file must be stored in a user accessible location because the 
entire job is run as the user.  It is stored under the .staging directory which 
if the user deletes will cause many other problems already and probably cause 
the job to fail.  We can try to set it up so that if the previous job history 
file does not exist on any app attempt but the first one we fail fast.  That 
would prevent retries from messing up the output directory.

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-27 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4817:
-

Attachment: MAPREDUCE-4817.patch

here is the patch that add the config for the ping timeout.  Attaching because 
it was finished already before other comments and in case we want to go that 
way.  

 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
 Attachments: MAPREDUCE-4817.patch


 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-27 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-4825:
-

 Summary: JobImpl.finished doesn't expect ERROR as a final job state
 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe


TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit due 
to an exception being thrown.  From the console output from testJobError:

{noformat}
2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
(TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
task_0__m_00
2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
dispatcher thread
java.lang.IllegalArgumentException: Illegal job state: ERROR
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:662)
2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4825:
--

Attachment: MAPREDUCE-4825.patch

Simple fix.  No additional unit tests since this is fixing an existing test.

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4825:
--

 Assignee: Jason Lowe
 Target Version/s: 2.0.3-alpha, 0.23.6
Affects Version/s: 0.23.5
   2.0.3-alpha
   Status: Patch Available  (was: Open)

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504861#comment-13504861
 ] 

Hadoop QA commented on MAPREDUCE-4825:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555049/MAPREDUCE-4825.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3072//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3072//console

This message is automatically generated.

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504897#comment-13504897
 ] 

Harsh J commented on MAPREDUCE-4824:


Hi,

- The message below in the exception can be improved I feel. I think its better 
to say Job ID was not recovered since it disabled recovery-upon-restart 
(mapred.job.restart.recover set to false).. Also, since this case is to be 
expected (non-default override), I think it ought to be a simple INFO log, but 
I understand we need to throw an Exception to halt the loading of the JIP.

{code}
+  if (recovered  !conf.getBoolean(mapred.job.restart.recover, true)) {
+throw new IOException(Job  + jobId +  should not be recovered  +
+since mapred.job.restart.recover is set to false.);
+  }
{code}

- We could also add this property to mapred-default.xml and document it that 
way.

The test changes look good.

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2012-11-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505012#comment-13505012
 ] 

Jason Lowe commented on MAPREDUCE-4815:
---

I think this will work well with a couple of caveats:

* Write permissions to the parent directory of the output directory is a new 
implicit requirement over the original FileOutputFormat.  I think in the vast 
majority of cases it won't be a problem, but it is a potential 
backwards-compatibility issue.
* There are existing output formats that override checkOutputSpecs() and 
explicitly remove the verification step that outputDir doesn't exist (e.g.: 
TeraOutputFormat).  If we only support this new scheme, those output formats 
could fail to commit since the rename in commitJob() will fail for a non-empty 
destination directory.  I think we should add this as an optimized path to 
FileOutputFormat, but keep the original, iterative rename scheme if the output 
directory isn't empty for backwards compatibility.

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4820) MRApps distributed-cache duplicate checks are incorrect

2012-11-27 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated MAPREDUCE-4820:
--

Target Version/s: 2.0.3-alpha
   Fix Version/s: (was: 2.0.3-alpha)

 MRApps distributed-cache duplicate checks are incorrect
 ---

 Key: MAPREDUCE-4820
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.2-alpha
Reporter: Alejandro Abdelnur
Priority: Blocker

 This seems a combination of issues that are being exposed in 2.0.2-alpha by 
 MAPREDUCE-4549.
 MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs 
 in the distributed-cache (using the JAR name as identity).
 In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are 
 symlink-ed to the current directory of the task.
 MRApps, when setting up the DistributedCache 
 (MRApps#setupDistributedCache-parseDistributedCacheArtifacts) assumes that 
 the local resources (this includes files in the CURRENT_DIR/, 
 CURRENT_DIR/classes/ and files in CURRENT_DIR/lib/) are part of the 
 distributed-cache already.
 For systems, like Oozie, which use a launcher job to submit the real job this 
 poses a problem because MRApps is run from the launcher job to submit the 
 real job. The configuration of the real job has the correct distributed-cache 
 entries (no duplicates), but because the current dir has the same files, the 
 submission fails.
 It seems that MRApps should not be checking dups in the distributed-cached 
 against JARs in the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be 
 done among distributed-cached entries only.
 It seems YARNRunner is symlink-ing all files in the distributed cached in the 
 current directory. In Hadoop 1 this was done only for files added to the 
 distributed-cache using a fragment (ie #FOO) to trigger a symlink creation. 
 Marking as a blocker because without a fix for this, Oozie cannot submit jobs 
 to Hadoop 2 (i've debugged Oozie in a live cluster being used by BigTop 
 -thanks Roman- to test their release work, and I've verified that Oozie 3.3 
 does not create duplicated entries in the distributed-cache)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events

2012-11-27 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated MAPREDUCE-4822:
--

Summary: Unnecessary conversions in History Events  (was: Unnessisary 
conversions in History Events)

 Unnecessary conversions in History Events
 -

 Key: MAPREDUCE-4822
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Priority: Trivial

 There are a number of conversions in the Job History Event classes that are 
 totally unnecessary.  It appears that they were originally used to convert 
 from the internal avro format, but now many of them do not pull the values 
 from the avro they store them internally.
 For example:
 {code:title=TaskAttemptFinishedEvent.java}
   /** Get the task type */
   public TaskType getTaskType() {
 return TaskType.valueOf(taskType.toString());
   }
 {code}
 The code currently is taking an enum, converting it to a string and then 
 asking the same enum to convert it back to an enum.  If java work properly 
 this should be a noop and a reference to the original taskType should be 
 returned.
 There are several places that a string is having toString called on it, and 
 since strings are immutable it returns a reference to itself.
 The various ids are not immutable and probably should not be changed at this 
 point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505234#comment-13505234
 ] 

Bikas Saha commented on MAPREDUCE-4824:
---

Agree with Harsh.
I assume this config is job specific and cannot be inadvertently set to disable 
recovery of all jobs?

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2012-11-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505244#comment-13505244
 ] 

Bikas Saha commented on MAPREDUCE-4815:
---

Does this code user FileSystem or specifically DistributedFileSystem (HDFS)? If 
the former, then how does this relate to the comment [~eric14] made earlier 
about cloud stores? 

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505252#comment-13505252
 ] 

Arun C Murthy commented on MAPREDUCE-4049:
--

Sorry, just caught up on this since I'm dealing with some health issues at home.

Frankly, worrying about whose work is a subset of whose is a pointless 
exercise. Having said that, making related tasks sub-tasks makes sense as long 
as there is a coherent community (one or more developers) working together 
makes sense, I don't see it for MAPREDUCE-4049 vis-a-vis MAPREDUCE-2454.

IAC, there is no need to debate this further - it's just a time sink.

Finally, MAPREDUCE-2454 is a bunch of large-scale changes. I'm happy to commit 
this as long as it's ready to, without tying it in.



Overall, I really don't like to see us egregiously rename core MR classes - at 
best it's pointless for private apis, and at worst it hammers svn log. So, pls 
do not change existing Shuffle etc.

Avner, please upload a patch with other changes:
# Use @LimitedPrivate, that way it makes it clear that this is for implementers 
and not end-users.
# I'm ok with suggested config names (again, I'm not religious about naming).

With that it's good to go.


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2012-11-27 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-4815:


Assignee: Arun C Murthy  (was: Bikas Saha)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Arun C Murthy

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2012-11-27 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505260#comment-13505260
 ] 

Arun C Murthy commented on MAPREDUCE-4815:
--

bq. Write permissions to the parent directory of the output directory is a new 
implicit requirement over the original FileOutputFormat. I think in the vast 
majority of cases it won't be a problem, but it is a potential 
backwards-compatibility issue.

Currently that is already required since FileOutputFormat creates the output 
dir in the parent dir itself, so that isn't a new requirement.

bq.  I think we should add this as an optimized path to FileOutputFormat, but 
keep the original, iterative rename scheme if the output directory isn't empty 
for backwards compatibility.

Makes sense. It's unfortunately much more code to maintain, and I'm not sure 
it's worth it, but a good idea nevertheless.

I have a preliminary patch which I'm testing, I'll upload it asap. 

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Arun C Murthy

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4661) Add HTTPS for WebUIs on Branch-1

2012-11-27 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated MAPREDUCE-4661:


Attachment: (was: https.patch)

 Add HTTPS for WebUIs on Branch-1
 

 Key: MAPREDUCE-4661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4661
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security, webapps
Affects Versions: 1.0.3
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-4461.patch, MAPREDUCE-4661.patch, 
 MAPREDUCE-4661.patch, MAPREDUCE-4661.patch


 After investigating the methodology used to add HTTPS support in branch-2, I 
 feel that this same approach should be back-ported to branch-1. I have taken 
 many of the patches used for branch-2 and merged them in.
 I was working on top of HDP 1 at the time - I will provide a patch for trunk 
 soon once I can confirm I am adding only the necessities for supporting HTTPS 
 on the webUIs.
 As an added benefit -- this patch actually provides HTTPS webUI to HBase by 
 extension. If you take a hadoop-core jar compiled with this patch and put it 
 into the hbase/lib directory and apply the necessary configs to hbase/conf.
 = OLD IDEA(s) BEHIND ADDING HTTPS (look @ Sept 17th patch) ==
 In order to provide full security around the cluster, the webUI should also 
 be secure if desired to prevent cookie theft and user masquerading. 
 Here is my proposed work. Currently I can only add HTTPS support. I do not 
 know how to switch reliance of the HttpServer from HTTP to HTTPS fully.
 In order to facilitate this change I propose the following configuration 
 additions:
 CONFIG PROPERTY - DEFAULT VALUE
 mapred.https.enable - false
 mapred.https.need.client.auth - false
 mapred.https.server.keystore.resource - ssl-server.xml
 mapred.job.tracker.https.port - 50035
 mapred.job.tracker.https.address - IP_ADDR:50035
 mapred.task.tracker.https.port - 50065
 mapred.task.tracker.https.address - IP_ADDR:50065
 I tested this on my local box after using keytool to generate a SSL 
 certficate. You will need to change ssl-server.xml to point to the .keystore 
 file after. Truststore may not be necessary; you can just point it to the 
 keystore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira