date:20121127

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Avner BenHanoch updated MAPREDUCE-4049:
---

Description:
Support generic shuffle service as set of two plugins: ShuffleProvider
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on shuffle
plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or
Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA
shuffle, the plugin can also utilize a suitable merge approach during the
intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden
dependency of NodeManager with a specific version of mapreduce shuffle
(currently targeted to 0.24.0).

References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu
from Auburn University with others,
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins
(currently, based on 1.0 branch)
# I am providing link for downloading UDA - Mellanox's open source plugin that
implements generic shuffle service using RDMA and levitated merge. Note: At
this phase, the code is in C++ through JNI and you should consider it as beta
only. Still, it can serve anyone that wants to implement or contribute to
levitated merge. (Please be advised that levitated merge is mostly suit in very
fast networks) -
[http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

was:
Support generic shuffle service as set of two plugins: ShuffleProvider
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on shuffle
plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or
Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA
shuffle, the plugin can also utilize a suitable merge approach during the
intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden
dependency of NodeManager with a specific version of mapreduce shuffle
(currently targeted to 0.24.0).

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: trunk

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

Support generic shuffle service as set of two plugins: ShuffleProvider
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on
shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE,
or Infiniband) instead of using the current HTTP shuffle. Based on the fast
RDMA shuffle, the plugin can also utilize a suitable merge approach during
the intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden
dependency of NodeManager with a specific version of mapreduce shuffle
(currently targeted to 0.24.0).
References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu
from Auburn University with others,
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins
(currently, based on 1.0 branch)
# I am providing link for downloading UDA - Mellanox's open source plugin
that implements generic shuffle service using RDMA and levitated merge.
Note: At this phase, the code is in C++ through JNI and you should consider
it as beta only. Still, it can serve anyone that wants to implement or
contribute to levitated merge. (Please be advised that levitated merge is
mostly suit in very fast networks) -
[http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504461#comment-13504461
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Hi Laxman,

Thanks for your comment and sorry for my late response.

I just posted a link for downloading the source code of Mellanox plugin that 
implements generic shuffle using RDMA and levitated merge.

You are warmly welcomed to contribute to push the algorithms of this plugin to 
the core of vanilla Hadoop, as well as to help accepting my straight forward 
patch in this JIRA issue.
Avner

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4762) repair test org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal

2012-11-27 Thread Ivan A. Veselovsky (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated MAPREDUCE-4762:
--

Attachment: MAPREDUCE-4762--b.patch
MAPREDUCE-4762-branch-0.23--b.patch

Hi, Robert,
the attached patches MAPREDUCE-4762-branch-0.23--b.patch and 
MAPREDUCE-4762--b.patch implement your suggestion. 
Patch MAPREDUCE-4762--b.patch targeted to branches trunk and branch-2. 

 repair test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal
 -

 Key: MAPREDUCE-4762
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4762
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4762--b.patch, 
 MAPREDUCE-4762-branch-0.23--b.patch, MAPREDUCE-4762-trunk.patch


 The test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal is 
 @Ignor-ed. 
 Due to that several classes in package 
 org.apache.hadoop.mapreduce.security.token have zero unit-test coverage.
 The problem is that the test assumed that class 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.Renewer 
 is used as a custom implementation of the 
 org.apache.hadoop.security.token.TokenRenewer service, but that did not 
 happen, because this custom service implementation was not registered. 
 We solved this problem by using special classloader that is invoked to find 
 the resource META-INF/services/org.apache.hadoop.security.token.TokenRenewer 
 , and supplies some custom content for it. This way the custom service 
 implementation gets instantiated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504502#comment-13504502
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:



_Alejandro,_

With all due respect, I think that something in your behavior is inappropriate:
 * You were never involved in this issue; still you gave yourself the liberty 
to make it a sub issue of your supported MAPREDUCE-2454 issue, without 
consulting anyone.
 * This is especially inappropriate since MAPREDUCE-2454 is disputable and has 
its acceptance problems regardless of my issue.  Hence, its acceptance problems 
will affect my issue.
 * Your justification *As all this JIRAs are small, I think we'll be able to 
move fast with all of them.* is inappropriate since you actually created a 
linkage that will surely postpone my issue instead of leaving each issue to 
progress at its own pace!
 * It is not the first time that the persons behind MAPREDUCE-2454 try to 
disturb this JIRA issue.

Apparently, I don't have the privileges to break this sub task linkage; 
hence, I am asking that you or someone else will do it.

I am welcoming any comment coming from a professional place with the simple 
target of making Hadoop better. Having that said,  I feel that the way you 
blitzed my patch with any possible patty comment, sometime with disputable 
claims, just before the patch is about to be accepted – is unfair, 
unprofessional and unfriendly. Especially considering your complete silence 
since this JIRA issue has started.

I am not sure that commenting in a blitz way will increase the quality of 
hadoop.  For example:

{quote}
Checking for shuffleConsumerPlugin != null before closing it seems redundant, 
you would have never got there if shufflePlugin is NULL.
{quote}
This is your mistake (I'll reach there in case isLocal == true).  *There is no 
option to remove the nullity check!*

{quote}
Visibility annotations for the ShuffleConsumerPlugin, ShuffleContext, should 
be Unstable
{quote}
I think it is inappropriate to declare plugin interface as Unstable, since it 
must stay stable for 3rd party vendors.

--- --- --- ---

Personally, I have no problem to implement all the rest of your comments. It 
should be very easy for me.  Still, I am raising few points for consideration 
regarding your following comments:

{quote}
The Shuffle class should be renamed to DefaultShuffle.
The ShuffleConsumerPlugin should be renamed to Shuffle.
{quote}
I chose the term 'ShuffleConsumerPlugin' and not something like 'Shuffle', 
because it clarifies that we are in a *plugin* of *ShuffleConsumer*, rather 
than a *builtin*  *ShuffleProvider/ShuffleHandler*.   Also, I didn't take the 
liberty to rename core classes of Hadoop.  

{quote}
ShuffleConsumerPlugin, getShuffleConsumerPlugin() method is not required, 
instead use the ReflectionUtils directly in the ReducerTask class.
{quote}
Here, I only followed existing convention of Hadoop as shown in 
ResourceCalculatorPlugin.getResourceCalculatorPlugin().  Personally, I'll be 
glad to follow your advice, and even to go one step further and make 
ShuffleConsumerPlugin an interface instead of AbstractClass.

{quote}
use 'mapreduce.job.reduce.shuffle.class' to be consistent with MAPREDUCE-2454.
{quote}
Here I chose 'mapreduce.shuffle…', since I think it is consistent with the 
current convention in hadoop-3 configuration.

--- --- --- ---

I can tell you that Arun  Todd didn't make it easy for me with their requests 
from this patch so far. Still, I understand, respect and accept all their 
comments.  I am sure that everyone involved only want the best for Hadoop.  
I suggest we hear Arun's consideration and move forward with the patch in the 
best professional way.

_*Arun,*_
I think you are very familiar with both Hadoop/MapReduce and this JIRA issue 
since its inception. You are also well familiar and involved with 
MAPREDUCE-2454.  It is also safe to say you know Alejandro and Asokan better 
than you know me.  I believe everyone involved will agree that your sole 
interest is Hadoop's quality.  *I am asking you and everyone else to help 
progressing here.*

Avner


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

[
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504521#comment-13504521
]

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Yarn-trunk #49 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk/49/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby)
(Revision 1413739)

Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413739
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java

repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

Key: MAPREDUCE-4764
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ivan A. Veselovsky
Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch

the test is @Ignore-ed, and fails being enabled.
Suggested to repair it to fill the coverage gap.
Problems fixed in the test:
(1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties
must be correctly set in the configuration to correctly enable the security
in the way this test implies.
(2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not
passed into the Job configuration -- it is intentionally deleted from there.
So, we pass the binary file name in another dedicated property.
(3) The test was using deprecated cluster classes. All them are updated to
the modern analogs.
(4) The delegation token found in the job context is now correctly compared
to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4762) repair test org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal

2012-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504537#comment-13504537
 ] 

Hadoop QA commented on MAPREDUCE-4762:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12554983/MAPREDUCE-4762--b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3071//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3071//console

This message is automatically generated.

 repair test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal
 -

 Key: MAPREDUCE-4762
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4762
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4762--b.patch, 
 MAPREDUCE-4762-branch-0.23--b.patch, MAPREDUCE-4762-trunk.patch


 The test 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal is 
 @Ignor-ed. 
 Due to that several classes in package 
 org.apache.hadoop.mapreduce.security.token have zero unit-test coverage.
 The problem is that the test assumed that class 
 org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.Renewer 
 is used as a custom implementation of the 
 org.apache.hadoop.security.token.TokenRenewer service, but that did not 
 happen, because this custom service implementation was not registered. 
 We solved this problem by using special classloader that is invoked to find 
 the resource META-INF/services/org.apache.hadoop.security.token.TokenRenewer 
 , and supplies some custom content for it. This way the custom service 
 implementation gets instantiated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504558#comment-13504558
]

Laxman commented on MAPREDUCE-4049:
---

bq. You are warmly welcomed to contribute to push the algorithms of this plugin
to the core of vanilla Hadoop

Thank you Avner. I wish to see this as part of hadoop.
I'm not able to build UDA you have provided as per BUILD.README provided in the
downloaded bundle. SVN repository provided is not accessible/resolvable.

https://sirius.voltaire.com/repos/enterprise/uda/trunk

bq. as well as to help accepting my straight forward patch in this JIRA issue.
I will personally request few of my friends (Hadoop contributors) to review
this jira.

plugin for generic shuffle service
--

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504560#comment-13504560
 ] 

Laxman commented on MAPREDUCE-4049:
---

I'm trying to build as per the README available here 
(http://mellanox.com/downloads/UDA/UDA3.0_Release.tar.gz).

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

[
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504581#comment-13504581
]

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Hdfs-0.23-Build #448 (See
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/448/])
svn merge -c 1413739 FIXES: MAPREDUCE-4764. repair TestBinaryTokenFile
(Ivan A. Veselovsky via bobby) (Revision 1413742)

Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413742
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
*
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java

repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

Key: MAPREDUCE-4764
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ivan A. Veselovsky
Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

[
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504589#comment-13504589
]

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Hdfs-trunk #1239 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1239/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby)
(Revision 1413739)

repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

Key: MAPREDUCE-4764
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ivan A. Veselovsky
Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

[
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504606#comment-13504606
]

Hudson commented on MAPREDUCE-4764:
---

Integrated in Hadoop-Mapreduce-trunk #1270 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1270/])
MAPREDUCE-4764. repair TestBinaryTokenFile (Ivan A. Veselovsky via bobby)
(Revision 1413739)

Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1413739
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java

repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

Key: MAPREDUCE-4764
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ivan A. Veselovsky
Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

Attachments: MAPREDUCE-4764.patch, MAPREDUCE-4764-trunk.patch

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504624#comment-13504624
]

Avner BenHanoch commented on MAPREDUCE-4049:

Hi Laxman,

You are referring to an internal document (there is no external document yet
:)).
The svn is only for downloading internally clean sources for releasing new
version. However, you already got the sources and you don't need it.

In fast, I think you should use:
# src/premake.sh
# build/makerpm.sh

Also, in fast, Please expect compilation dependency:
* In the C++, on librdmacm-devel
* In the java, you'll need to copy the hadoop jars, that are used by the
plugin, into the plugin's directory (see them according to CLASSPATH in the
makefile at the plugin's directory)

Before you go with the java side, you may choose to edit makerpm.sh and comment
out hadoop flavors that you don't care about.

Please be aware that you are the 1st one that tries to build the sources
outside Mellanox.
Also, I am not sure this is the place and way to get support for Mellanox
products.

plugin for generic shuffle service
--

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504633#comment-13504633
]

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Hi Avner,

I respectfully disagree with your opinion that my behavior is inappropriate.

First of all, it is not my intention to slow you this JIRA down, but to make
sure it is consistent with the related work in MAPREDUCE-2454 (you can see that
in my comments). If that requires a couple of extra days, is is a small price
to pay.

As an Apache Hadoop developer is my responsibility to review and provide
feedback on work posted by other developers, my usual triggers are area of
knowledge, related work and area of interest.

This JIRA is tightly related to MAPREDUCE-2454, there is not dispute on that.
Thus it should stay as a subtask of it.

MAPREDUCE-2454 is not disputable, as it has been commented in it JIRA, it is
almost ready, it was matter of breaking it up and doing an fast interactive
review of its parts. As far as I can tell, this is already happening there.

Now going to your comments on my review:

* Yes the *shuffleConsumerPlugin != null*, you are right, I've noticed that
after I've posted my comments, so you can disregard that done.

* On the marking the ShuffleConsumerPlugin, ShuffleContext as *unstable*, it is
not appropriate, Hadoop wants to keep the right of modifying these APIs in the
future, if hte need arises. You can also see this, no only in MAPREDUCE-2454,
but in several places where Hadoop provides pluggability (ie
ResourceManagement, authentication).

* On making the ShuffleConsumerPlugin and interface, that is a good idea, it
will align things with the other sub-tasks.

Looking forward to see the updated patch.

Cheers.

plugin for generic shuffle service
--

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504635#comment-13504635
 ] 

Jason Lowe commented on MAPREDUCE-4813:
---

MAPREDUCE-4815 only addresses FileOutputCommitter and friends, but the 
committer is arbitrary user code.  It could be doing all sorts of things 
including connecting to databases, etc.  So I still think we need this, 
although the priority of it is reduced given how many things are built from 
FileOutputCommitter.

 AM timing out during job commit
 ---

 Key: MAPREDUCE-4813
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4813.patch


 The AM calls the output committer's {{commitJob}} method synchronously during 
 JobImpl state transitions, which means the JobImpl write lock is held the 
 entire time the job is being committed.  Holding the write lock prevents the 
 RM allocator thread from heartbeating to the RM.  Therefore if committing the 
 job takes too long (e.g.: the job has tons of files to commit and/or the 
 namenode is bogged down) then the AM appears to be unresponsive to the RM and 
 the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504646#comment-13504646
]

Jason Lowe commented on MAPREDUCE-4819:
---

bq. Maybe final client notification should be the last thing after all post
processing is done.

No, moving the client notification later just creates a different set of
problems, like the client never being notified *at all* because the AM crashes
after unregistering with the RM but before it notifies the client. The RM
won't restart the app because it unregistered successfully, but the client is
never notified.

bq. In general it seems like we need to come up with a set of markers that
previous AM's leave behind that can tell the next retry if the previous one
failed/succeeded and so the current AM should exit or continue to run.

Exactly, and the AM is already doing this in the job history file which is how
it helps supports recovery. We should extend this so that even if the output
committer doesn't support recovery the AM will check for markers in the job
history file and prevent the job from executing tasks and committing output if
final job status has been determined by previous attempts. That way we prevent
the AM from re-committing job output or changing the final job status after
notifying the client. We just need to make sure those markers are flushed to
persistent store and located properly by future AM attempts before attempting
to notify the client. If subsequent attempts see the final job status marker
then they should skip straight to the client notification process instead of
running tasks.

AM can rerun job after reporting final job status to the client
---

Key: MAPREDUCE-4819
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

If the AM reports final job status to the client but then crashes before
unregistering with the RM then the RM can run another AM attempt. Currently
AM re-attempts assume that the previous attempts did not reach a final job
state, and that causes the job to rerun (from scratch, if the output format
doesn't support recovery).
Re-running the job when we've already told the client the final status of the
job is bad for a number of reasons. If the job failed, it's confusing at
best since the client was already told the job failed but the subsequent
attempt could succeed. If the job succeeded there could be data loss, as a
subsequent job launched by the client tries to consume the job's output as
input just as the re-attempt starts removing output files in preparation for
the output commit.

[jira] [Created] (MAPREDUCE-4822) Unnessisary conversions in History Events

2012-11-27 Thread Robert Joseph Evans (JIRA)

Robert Joseph Evans created MAPREDUCE-4822:
--

 Summary: Unnessisary conversions in History Events
 Key: MAPREDUCE-4822
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Priority: Trivial


There are a number of conversions in the Job History Event classes that are 
totally unnecessary.  It appears that they were originally used to convert from 
the internal avro format, but now many of them do not pull the values from the 
avro they store them internally.

For example:

{code:title=TaskAttemptFinishedEvent.java}
  /** Get the task type */
  public TaskType getTaskType() {
return TaskType.valueOf(taskType.toString());
  }
{code}

The code currently is taking an enum, converting it to a string and then asking 
the same enum to convert it back to an enum.  If java work properly this should 
be a noop and a reference to the original taskType should be returned.

There are several places that a string is having toString called on it, and 
since strings are immutable it returns a reference to itself.

The various ids are not immutable and probably should not be changed at this 
point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4823) NPE in jobhistory.jsp

2012-11-27 Thread Steve Loughran (JIRA)

Steve Loughran created MAPREDUCE-4823:
-

 Summary: NPE in jobhistory.jsp
 Key: MAPREDUCE-4823
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4823
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.3
 Environment: Running on a JT which had a bit of confusion w.r.t its 
hostname (two IP addresses in /etc/hosts for the same hostname)
Reporter: Steve Loughran
Priority: Minor


asking for the job history page resulted in a stack trace instead of (an empty) 
job history

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Tom White (JIRA)

Tom White created MAPREDUCE-4824:


 Summary: Provide a mechanism for jobs to indicate they should not 
be recovered on restart
 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White


Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
however the approach there is not applicable for MR1, since even if we only use 
the job-level part of the patch and add a isRecoverySupported method to 
OutputCommitter, there is no way to use that information from the JT (which 
initiates recovery), since the JT does not instantiate OutputCommitters - and 
it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
calls the method.)

Instead, we can add a MR configuration property to say that a job is not 
recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4823) NPE in jobhistory.jsp

2012-11-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504699#comment-13504699
 ] 

Steve Loughran commented on MAPREDUCE-4823:
---

stack trace -which bears no relation to where in the JSP page the actual NPE 
was triggered. The generated java pages would show it.

{code}
java.lang.NullPointerException
at 
org.apache.hadoop.mapred.jobhistoryhome_jsp._jspService(jobhistoryhome_jsp.java:151)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:814)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}

 NPE in jobhistory.jsp
 -

 Key: MAPREDUCE-4823
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4823
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.3
 Environment: Running on a JT which had a bit of confusion w.r.t its 
 hostname (two IP addresses in /etc/hosts for the same hostname)
Reporter: Steve Loughran
Priority: Minor

 asking for the job history page resulted in a stack trace instead of (an 
 empty) job history

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4824:
-

Attachment: MAPREDUCE-4824.patch

Here's a patch that implements this idea. Jobs that shouldn't be recovered 
should set mapred.job.restart.recover to false.

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Koji Noguchi (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504758#comment-13504758
]

Koji Noguchi commented on MAPREDUCE-4819:
-

bq. like the client never being notified at all because the AM crashes after
unregistering with the RM but before it notifies the client.

As long as client eventually fail, that's not a problem.

Critical problem we have here is false-positive from the client's perspective.
Client is getting 'success' but output is incomplete or corrupt(due to retried
application/job (over)writing to the same target path.)

If we can have AM and RM to agree on the job status before telling the client,
I think that would work. There could be a corner case when AM and RM say the
job was successful but client thinks it failed. This false-negative is much
better than false-positive issue we have now. Even in 0.20, we had cases when
JobTracker reports job was successful but client thinks it failed due to
communication failure to the JobTracker. This is fine to happen and we should
let the client handle the recovery-or-retry.

bq. In general it seems like we need to come up with a set of markers that
previous AM's leave behind

I don't want the correctness of the job to depend on the marker on hdfs.

AM can rerun job after reporting final job status to the client
---

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Koji Noguchi (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504767#comment-13504767
]

Koji Noguchi commented on MAPREDUCE-4819:
-

bq. I don't want the correctness of the job to depend on the marker on hdfs.

I meant, hdfs on user space like outputpath. If this is stored elsewhere where
user cannot access, I have no problem.

AM can rerun job after reporting final job status to the client
---

[jira] [Commented] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-27 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504777#comment-13504777
]

Thomas Graves commented on MAPREDUCE-4817:
--

When you say knock off the ping thread I assume you really mean just the ping
timeout check since the task progress happens in the same thread?

So the ping serves multiple purposes. Currently it notifies the AM that the
task has pinged in and is still running. This could be useful even with
taskTimeout since the taskTimeout could be turned off (set to 0) and we would
never know if that task got hung. Second, the task uses it to check to see if
the AM is still alive. If it doesn't return true, the task is supposed to
exit. 1.X also had the ping check, but it went to the taskTracker and the
tasktracker validated that the parent Task of the ping checker thread was still
there.

Now with 0.23 the nodemanager is watching the processes and talking back to the
RM to let it know that the AM died and if it died it kills the other tasks, but
if the entire nodemanager goes down then the task doesn't know the AM went
away. If the task isn't sending progress, and the task timeout is set to 0,
and this is the last AM retry it could hang around forever.

The odds of that seem pretty small and I guess if we aren't worried about the
first happening, the second probably isn't that interesting either. But we
could also just remove the ping timeout check in the TaskHeartBeatHandler.
What exactly are you proposing?

Hardcoded task ping timeout kills tasks localizing large amounts of data

Key: MAPREDUCE-4817
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical

When a task is launched and spends more than 5 minutes localizing files, the
AM will kill the task due to ping timeout. The AM's TaskHeartbeatHandler
currently tracks tasks via a progress timeout and a ping timeout. The
progress timeout can be controlled via mapreduce.task.timeout and even
disabled by setting the property to 0. The ping timeout, however, is
hardcoded to 5 minutes and cannot be configured. Therefore if the task takes
too long localizing, it never gets running in order to ping back to the AM
and the AM kills it due to ping timeout.

[jira] [Commented] (MAPREDUCE-4821) Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4

2012-11-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504776#comment-13504776
 ] 

Steve Loughran commented on MAPREDUCE-4821:
---

is there a JUnit 3 jar in your Ant classpath? There has to be a junit4 one else 
the test case won't compile -I suspect your ant installation has a junit jar 
that's being picked up first at test run time.

{{ant -diagnostics}} will show this. If it's there, delete it and see what 
happens when the original test is rerun.

 Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4
 

 Key: MAPREDUCE-4821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.0.3, 1.0.4
 Environment: RHEL 6.3 on x86
Reporter: Amir Sanjar
 Fix For: 1.0.3, 1.1.1

 Attachments: MAPREDUCE-4821-branch1.patch, 
 MAPREDUCE-4821-release-1.0.3.patch


 Problem:
 JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
 Migrate the testcase to JUnit4, including:
 * Remove extends TestCase
 * Remove import junit.framework.TestCase;
 * Add import org.junit.*; 
 * Use appropriate annotations such as @After, @Before, @Test.
 uploading a patch shortly 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-27 Thread Robert Joseph Evans (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504784#comment-13504784
]

Robert Joseph Evans commented on MAPREDUCE-4819:

We are informing several different actors of success/failure in many
different ways.

# _SUCCESS file being written to HDFS by the output committer as part of
commitJob()
# job end notification by hitting an http server
# client being informed through RPC
# history server being informed by placing the log in a directory it can see
# resource manager being informed that the application is done

Some of these are much more important to report then others, but either way we
still have at a minimum two different things that need to be tied together the
commitJob and informing the RM not to run us again. Rearranging the order of
them will not fix the fact that after commitJob() finishes there is the
possibility that something will fail but must not fail the job. We really need
to have a two phase commit in the job history file.

I am about to commit the job output.
commitJob()
I finished committing the job output successfully.

Without this there will always be the possibility that commitJob will be called
twice, which would result in changes to the output directory. I would argue too
that some of these are important enough that we consider reporting them twice
and updating the listener to handle double reporting. Like informing the
history server about the job finishing. Others it is not so critical, like job
end notification or client RPC.

Koji,

I get that we want to reduce the risk of a user shooting themselves in the
foot, but the file must be stored in a user accessible location because the
entire job is run as the user. It is stored under the .staging directory which
if the user deletes will cause many other problems already and probably cause
the job to fail. We can try to set it up so that if the previous job history
file does not exist on any app attempt but the first one we fail fast. That
would prevent retries from messing up the output directory.

AM can rerun job after reporting final job status to the client
---

[jira] [Updated] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-27 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Graves updated MAPREDUCE-4817:
-

Attachment: MAPREDUCE-4817.patch

here is the patch that add the config for the ping timeout. Attaching because
it was finished already before other comments and in case we want to go that
way.

Hardcoded task ping timeout kills tasks localizing large amounts of data

[jira] [Created] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

Jason Lowe created MAPREDUCE-4825:
-

 Summary: JobImpl.finished doesn't expect ERROR as a final job state
 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe


TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit due 
to an exception being thrown.  From the console output from testJobError:

{noformat}
2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
(TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
task_0__m_00
2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
dispatcher thread
java.lang.IllegalArgumentException: Illegal job state: ERROR
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:662)
2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4825:
--

Attachment: MAPREDUCE-4825.patch

Simple fix.  No additional unit tests since this is fixing an existing test.

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4825:
--

 Assignee: Jason Lowe
 Target Version/s: 2.0.3-alpha, 0.23.6
Affects Version/s: 0.23.5
   2.0.3-alpha
   Status: Patch Available  (was: Open)

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504861#comment-13504861
 ] 

Hadoop QA commented on MAPREDUCE-4825:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555049/MAPREDUCE-4825.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3072//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3072//console

This message is automatically generated.

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-27 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504897#comment-13504897
 ] 

Harsh J commented on MAPREDUCE-4824:


Hi,

- The message below in the exception can be improved I feel. I think its better 
to say Job ID was not recovered since it disabled recovery-upon-restart 
(mapred.job.restart.recover set to false).. Also, since this case is to be 
expected (non-default override), I think it ought to be a simple INFO log, but 
I understand we need to throw an Exception to halt the loading of the JIP.

{code}
+  if (recovered  !conf.getBoolean(mapred.job.restart.recover, true)) {
+throw new IOException(Job  + jobId +  should not be recovered  +
+since mapred.job.restart.recover is set to false.);
+  }
{code}

- We could also add this property to mapred-default.xml and document it that 
way.

The test changes look good.

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files