date:20121128


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505518#comment-13505518
 ] 

Marc Reichman commented on MAPREDUCE-2374:
--

Please consider backporting this to the stable branch, as I am seeing this 
regularly in 1.0.3/1.0.4. I believe this is the true fix for the original 
condition (not what was fixed, see the last few comments) of MAPREDUCE-4003. At 
the very least, if someone could provide a patched hadoop 1.0.3 jar with this 
bash fix I would try it out.



 Text File Busy errors launching MR tasks
 --

 Key: MAPREDUCE-2374
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Andy Isaacson
 Fix For: 0.23.3, 2.0.2-alpha

 Attachments: failed_taskjvmsh.strace, mapreduce-2374-2.txt, 
 mapreduce-2374-branch-1.patch, mapreduce-2374-on-20sec.txt, 
 mapreduce-2374.txt, mapreduce-2374.txt, mapreduce-2374.txt, 
 successfull_taskjvmsh.strace


 Some very small percentage of tasks fail with a Text file busy error.
 The following was the original diagnosis:
 {quote}
 Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
 class swallows all IO exceptions. We're not currently checking for errors, 
 which I'm seeing result in occasional task failures with the message Text 
 file busy - assumedly because the close() call is failing silently for some 
 reason.
 {quote}
 .. but turned out to be another issue as well (see below)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4826) backport MAPREDUCE-2374 fix to 1.0.x stable

Marc Reichman created MAPREDUCE-4826:


 Summary: backport MAPREDUCE-2374 fix to 1.0.x stable
 Key: MAPREDUCE-4826
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4826
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task-controller
Affects Versions: 1.0.3
 Environment: Linux CentOS 6.3 amd64
Reporter: Marc Reichman


Please consider backporting this fix to 1.0.x. I am running into it frequently 
, and it seems to be the original situation of MAPREDUCE-4003, which was marked 
fixed for a different item (see the last few comments).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4826) backport MAPREDUCE-2374 fix to 1.0.x stable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Reichman updated MAPREDUCE-4826:
-

Description: Please consider backporting the fix for MAPREDUCE-2374 to 
1.0.x. I am running into it frequently , and it seems to be the original 
situation of MAPREDUCE-4003, which was marked fixed for a different item (see 
the last few comments).  (was: Please consider backporting this fix to 1.0.x. I 
am running into it frequently , and it seems to be the original situation of 
MAPREDUCE-4003, which was marked fixed for a different item (see the last few 
comments).)

 backport MAPREDUCE-2374 fix to 1.0.x stable
 ---

 Key: MAPREDUCE-4826
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4826
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task-controller
Affects Versions: 1.0.3
 Environment: Linux CentOS 6.3 amd64
Reporter: Marc Reichman
   Original Estimate: 48h
  Remaining Estimate: 48h

 Please consider backporting the fix for MAPREDUCE-2374 to 1.0.x. I am running 
 into it frequently , and it seems to be the original situation of 
 MAPREDUCE-4003, which was marked fixed for a different item (see the last few 
 comments).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4809) Make classes required for MAPREDUCE-2454 to be java public (with LimitedPrivate)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4809:
-

   Resolution: Fixed
Fix Version/s: (was: 2.0.3-alpha)
   MR-2454
   Status: Resolved  (was: Patch Available)

+1.

I've just committed this to MR-2454 branch, thanks Asokan!

 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)
 

 Key: MAPREDUCE-4809
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4809
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: MR-2454

 Attachments: MAPREDUCE-4809-1.patch, mapreduce-4809.patch, 
 mapreduce-4809.patch, mapreduce-4809.patch


 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-28 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4824:
-

Attachment: MAPREDUCE-4824.patch

Thanks for the feedback. Here's an updated patch with the improved message.

I didn't add the property to mapred-default.xml, since it is a job-specific 
property and these are generally not added there. There's no way to have true 
job-specific properties, since if someone adds the property to the jobtracker's 
mapred-site.xml file then it will be picked up. I'm not sure there's an easy 
way around this. 

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4807:
-

Status: Open  (was: Patch Available)

The patch looks reasonable, some comments:

# The Context should just be passed into the ctor rather than ctor/init pairs - 
they don't buy us much.
# Please keep the member fields in MapOutputBuffer/DirectMapOutputCollector, 
this way your patch is *much* smaller.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-11-28 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505596#comment-13505596
 ] 

Harsh J commented on MAPREDUCE-4824:


bq. I didn't add the property to mapred-default.xml, since it is a job-specific 
property and these are generally not added there.

We do have several job-specific properties with proper defaults listed in that 
file. Unless someone overrides them manually, how come there is harm in doing 
this, and must we remove the ones already present?

The file just helps serve as a good doc. behind the config feature, cause 
otherwise there's no doc reference to this in the patch.

 Provide a mechanism for jobs to indicate they should not be recovered on 
 restart
 

 Key: MAPREDUCE-4824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch


 Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
 recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
 however the approach there is not applicable for MR1, since even if we only 
 use the job-level part of the patch and add a isRecoverySupported method to 
 OutputCommitter, there is no way to use that information from the JT (which 
 initiates recovery), since the JT does not instantiate OutputCommitters - and 
 it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
 calls the method.)
 Instead, we can add a MR configuration property to say that a job is not 
 recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-28 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4817:
-

Status: Patch Available  (was: Open)

 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
 Attachments: MAPREDUCE-4817.patch, MAPREDUCE-4817.patch


 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-28 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Graves updated MAPREDUCE-4817:
-

Attachment: MAPREDUCE-4817.patch

This patch removes the ping Timeout check from the AM task heart beat handler.
If we want to remove the other side from each Task we can do that in separate
jira.

Hardcoded task ping timeout kills tasks localizing large amounts of data

Key: MAPREDUCE-4817
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
Attachments: MAPREDUCE-4817.patch, MAPREDUCE-4817.patch

When a task is launched and spends more than 5 minutes localizing files, the
AM will kill the task due to ping timeout. The AM's TaskHeartbeatHandler
currently tracks tasks via a progress timeout and a ping timeout. The
progress timeout can be controlled via mapreduce.task.timeout and even
disabled by setting the property to 0. The ping timeout, however, is
hardcoded to 5 minutes and cannot be configured. Therefore if the task takes
too long localizing, it never gets running in order to ping back to the AM
and the AM kills it due to ping timeout.

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505617#comment-13505617
 ] 

Jason Lowe commented on MAPREDUCE-4819:
---

We have to be careful about the fact that the job history log is moved to the 
done intermediate directory during shutdown after notifying the client.  
Therefore there's a window of opportunity where we can fail after notifying the 
client and moving the job history file but before unregistering from the RM.  
When the app attempt restarts in that case, the job history file won't be found 
and we'll end up re-running the job from scratch.  We either need to unregister 
from the RM first (and rely on the FINISHING grace period to buy us enough time 
to move the file) or explicitly *not* delete the file when we copy it to done 
intermediate and instead wait for the staging directory to be removed later to 
clean it up.

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505618#comment-13505618
]

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

(Arun, hopefully you family health issues are on the right track)

Avner,

* I'm ok with leaving *Shuffle* as it is, though I don't like the *Consumer* in
*ShufleConsumerPlugin* interface, I'd be OK with *ShufflePlugin*.
* The property name should relfect the final name o the *ShufleConsumerPlugin*
interface.
* Please make ShuffleContext a static inner class of the *ShufleConsumerPlugin*
interface called *Context*.

While I'm not religious about names, I do care. In this case, we have the
opportunity to have a consistent set of names and APIs (ie inner Context) for a
set of related plugins (all the ones affected by MAPREDUCE-2454).

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: trunk

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

Support generic shuffle service as set of two plugins: ShuffleProvider
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on
shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE,
or Infiniband) instead of using the current HTTP shuffle. Based on the fast
RDMA shuffle, the plugin can also utilize a suitable merge approach during
the intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden
dependency of NodeManager with a specific version of mapreduce shuffle
(currently targeted to 0.24.0).
References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu
from Auburn University with others,
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins
(currently, based on 1.0 branch)
# I am providing link for downloading UDA - Mellanox's open source plugin
that implements generic shuffle service using RDMA and levitated merge.
Note: At this phase, the code is in C++ through JNI and you should consider
it as beta only. Still, it can serve anyone that wants to implement or
contribute to levitated merge. (Please be advised that levitated merge is
mostly suit in very fast networks) -
[http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505625#comment-13505625
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4807:
---

Arun,

Regarding your #1 comment, I don't think is a good idea given that he MOC is 
instantiated using ReflectionUtils.newInstance(). Thus you cannot pass the 
context, you need the init(). It the same pattern used MAPREDUCE-4049.

{code}
  private KEY, VALUE MapOutputCollectorKEY, VALUE
  createMapOutputCollector(JobConf job, TaskReporter reporter)
throws IOException, ClassNotFoundException {
MapOutputCollectorKEY, VALUE collector
  = (MapOutputCollectorKEY, VALUE)
   ReflectionUtils.newInstance(
job.getClass(JobContext.MAP_OUTPUT_COLLECTOR_CLASS_ATTR,
MapOutputBuffer.class, MapOutputCollector.class), job);
LOG.info(Map output collector class =  + collector.getClass().getName());
MapOutputCollector.Context context =
   new MapOutputCollector.Context(this, job, reporter);
collector.init(context);
return collector;
  }
{code}

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505628#comment-13505628
 ] 

Robert Joseph Evans commented on MAPREDUCE-4819:


My vote would be to leave it around until we are done done and staging is 
removed.  It seems simpler.

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

[
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505636#comment-13505636
]

Robert Joseph Evans commented on MAPREDUCE-4817:

The patch is simple and straight forward I am +1 assuming that Jekins is OK
with it. I am not sure that we need to update the task. The ping is used
check if the task can reach the AM still. If you want to remove it go ahead
and file a JIRA but it may have further ramifications.

Hardcoded task ping timeout kills tasks localizing large amounts of data

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505644#comment-13505644
]

Jason Lowe commented on MAPREDUCE-4819:
---

bq. My vote would be to leave it around until we are done done and staging is
removed. It seems simpler.

Agreed, although we would also need to make sure we only delete the staging
directory after unregistering from the RM. Something we need to do anyway, see
YARN-244.

AM can rerun job after reporting final job status to the client
---

Key: MAPREDUCE-4819
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

If the AM reports final job status to the client but then crashes before
unregistering with the RM then the RM can run another AM attempt. Currently
AM re-attempts assume that the previous attempts did not reach a final job
state, and that causes the job to rerun (from scratch, if the output format
doesn't support recovery).
Re-running the job when we've already told the client the final status of the
job is bad for a number of reasons. If the job failed, it's confusing at
best since the client was already told the job failed but the subsequent
attempt could succeed. If the job succeeded there could be data loss, as a
subsequent job launched by the client tries to consume the job's output as
input just as the re-attempt starts removing output files in preparation for
the output commit.

[jira] [Created] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

Radim Kolar created MAPREDUCE-4827:
--

 Summary: Increase hash quality of HashPartitioner
 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar


hash partitioner is using object.hashCode() for splitting keys into partitions. 
This results in bad distributions because hashCode() quality is poor. 

These hashCode() functions are sometimes written by hand (very poor quality) 
and sometimes generated from by commons lang code (poor quality). Applying some 
transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Attachment: betterhash1.txt

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-28 Thread Avner BenHanoch (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505659#comment-13505659
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Thanks Alejandro,

I'll submit the patch next week, based on ALL your (and Arun's) comments:

* Shuffle - leave the class name as is
* ShufflePlugin - instead of ShuffleConsumerPlugin
* ShufflePlugin will be an interface
* property name will be: *mapreduce.job.reduce.shuffle.plugin.class* (Kindly 
let me know ASAP if you prefer other name, or in case you consulted 
mapred-default.xml and preferred names like mapreduce.reduce.shuffle... OR 
mapreduce.shuffle... )
* ShuffleContext - ShufflePlugin.Context - a static inner class
* ShufflePlugin will be @LimitedPrivate (without @unstable)

Cheers,
 Avner

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505673#comment-13505673
 ] 

Robert Joseph Evans commented on MAPREDUCE-4827:


I can see that there may be a need to improve the hashing of some poor quality 
implementations and the patch looks OK.  I am not an expert on hash functions 
but from what I know it looks good.  Do you have some concrete numbers that we 
can see how it improved the distribution in some specific cases?

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505677#comment-13505677
 ] 

Hadoop QA commented on MAPREDUCE-4817:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555186/MAPREDUCE-4817.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3073//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3073//console

This message is automatically generated.

 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
 Attachments: MAPREDUCE-4817.patch, MAPREDUCE-4817.patch


 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505679#comment-13505679
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Avner, everything looks good except your last bullet, ShufflePlugin  Context 
must be marked as @LimitedPrivate for MapReduce and as @Unstable.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505683#comment-13505683
 ] 

Robert Joseph Evans commented on MAPREDUCE-4819:


Yes, but going off of Koji's comments we also want to be sure that if the 
previous attempts edit log does not exist we don't know what state we were in 
and we should just assume we need to unregister and exit.

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical

 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505687#comment-13505687
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Avner, one more thing, please make sure the patch applies to branch MR-2454 
(https://svn.apache.org/repos/asf/hadoop/common/branches/MR-2454). thx


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505701#comment-13505701
 ] 

Robert Joseph Evans commented on MAPREDUCE-4825:


The patch looks fine to me. +1

I'll check it in.

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505707#comment-13505707
 ] 

Radim Kolar commented on MAPREDUCE-4827:


its knutt formula commonly used in hashtables for improve hashing. java 
hashtable is using it too

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4825:
---

   Resolution: Fixed
Fix Version/s: 0.23.6
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Jason,

I put this in trunk, branch-2, and branch-0.23

 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505728#comment-13505728
 ] 

Arun C Murthy commented on MAPREDUCE-4049:
--

Looks good.

Some more I've noted previously:

# Context should have get/set apis
# I don't see a need to replace all member fields in Shuffle.java, just init 
them from the passed-in context.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4825) JobImpl.finished doesn't expect ERROR as a final job state

2012-11-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505730#comment-13505730
 ] 

Hudson commented on MAPREDUCE-4825:
---

Integrated in Hadoop-trunk-Commit #3069 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3069/])
MAPREDUCE-4825. JobImpl.finished doesn't expect ERROR as a final job state 
(jlowe via bobby) (Revision 1414840)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1414840
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java


 JobImpl.finished doesn't expect ERROR as a final job state
 --

 Key: MAPREDUCE-4825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4825.patch


 TestMRApp.testJobError is causing AsyncDispatcher to exit with System.exit 
 due to an exception being thrown.  From the console output from testJobError:
 {noformat}
 2012-11-27 18:46:15,240 ERROR [AsyncDispatcher event handler] impl.TaskImpl 
 (TaskImpl.java:internalError(665)) - Invalid event T_SCHEDULE on Task 
 task_0__m_00
 2012-11-27 18:46:15,242 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(132)) - Error in 
 dispatcher thread
 java.lang.IllegalArgumentException: Illegal job state: ERROR
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.finished(JobImpl.java:838)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1622)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InternalErrorTransition.transition(JobImpl.java:1)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:359)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:287)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:723)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:974)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2012-11-27 18:46:15,242 INFO  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(135)) - Exiting, bbye..
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-28 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-4819:
--

Attachment: MAPREDUCE-4819.1.patch

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical
 Attachments: MAPREDUCE-4819.1.patch


 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505740#comment-13505740
 ] 

Arun C Murthy commented on MAPREDUCE-4807:
--

Good point, agreed.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505743#comment-13505743
 ] 

Arun C Murthy commented on MAPREDUCE-4807:
--

Also, the function needs to be renamed to 'createSortingCollector' or some such 
since it isn't creating the DirectMapOutputCollector - equivalently, we can 
move the creation of DirectMapOutputCollector there too.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505742#comment-13505742
 ] 

Mariappan Asokan commented on MAPREDUCE-4807:
-

Hi Arun,
  Thanks for your comments.  I agree with Alejandro on #1.  On #2, I agree with 
you.  The patch will definitely get smaller.  I will go ahead and make the 
changes.

-- Asokan

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-28 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505747#comment-13505747
 ] 

Bikas Saha commented on MAPREDUCE-4819:
---

Attaching a patch based on discussions with Vinod and implementing what is in 
his comment above. I was testing it by making the AM die during 
MRAppMaster.shutdownJob() after successful job completion but the second 
attempt could not find the history file during recoveryService.parse()

File does not exist: 
/tmp/hadoop-yarn/staging/bikas/.staging/job_1354125268052_0001_1.jhist

bq. the job history log is moved to the done intermediate dir
Can this explain why I am seeing the above error? Any pointers?

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical
 Attachments: MAPREDUCE-4819.1.patch


 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505792#comment-13505792
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Hey Arun, changing the Context methods to get*() makes sense. Adding set*() 
methods is not needed at this point, right?

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-28 Thread Avner BenHanoch (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505801#comment-13505801
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Arun/Alejandro, can you pls delink it?

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505806#comment-13505806
 ] 

Jason Lowe commented on MAPREDUCE-4819:
---

See JobHistoryEventHandler.closeEventWriter and moveToDoneNow.  That's what's 
moving the job history file from the staging directory to the done intermediate 
directory so the history server picks it up.  We need to not delete the file 
after we move it.

 AM can rerun job after reporting final job status to the client
 ---

 Key: MAPREDUCE-4819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Bikas Saha
Priority: Critical
 Attachments: MAPREDUCE-4819.1.patch


 If the AM reports final job status to the client but then crashes before 
 unregistering with the RM then the RM can run another AM attempt.  Currently 
 AM re-attempts assume that the previous attempts did not reach a final job 
 state, and that causes the job to rerun (from scratch, if the output format 
 doesn't support recovery).
 Re-running the job when we've already told the client the final status of the 
 job is bad for a number of reasons.  If the job failed, it's confusing at 
 best since the client was already told the job failed but the subsequent 
 attempt could succeed.  If the job succeeded there could be data loss, as a 
 subsequent job launched by the client tries to consume the job's output as 
 input just as the re-attempt starts removing output files in preparation for 
 the output commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-28 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4817:
-

  Resolution: Fixed
   Fix Version/s: 0.23.6
  2.0.3-alpha
  3.0.0
Target Version/s: 3.0.0, 2.0.3-alpha, 0.23.6
  Status: Resolved  (was: Patch Available)

Thanks Bobby, I've committed this.

 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4817.patch, MAPREDUCE-4817.patch


 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data

2012-11-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505817#comment-13505817
 ] 

Hudson commented on MAPREDUCE-4817:
---

Integrated in Hadoop-trunk-Commit #3070 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3070/])
MAPREDUCE-4817. Hardcoded task ping timeout kills tasks localizing large 
amounts of data (tgraves) (Revision 1414873)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1414873
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java


 Hardcoded task ping timeout kills tasks localizing large amounts of data
 

 Key: MAPREDUCE-4817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am
Affects Versions: 0.23.3, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Thomas Graves
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4817.patch, MAPREDUCE-4817.patch


 When a task is launched and spends more than 5 minutes localizing files, the 
 AM will kill the task due to ping timeout.  The AM's TaskHeartbeatHandler 
 currently tracks tasks via a progress timeout and a ping timeout.  The 
 progress timeout can be controlled via mapreduce.task.timeout and even 
 disabled by setting the property to 0.  The ping timeout, however, is 
 hardcoded to 5 minutes and cannot be configured.  Therefore if the task takes 
 too long localizing, it never gets running in order to ping back to the AM 
 and the AM kills it due to ping timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4813) AM timing out during job commit


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4813:
--

Attachment: MAPREDUCE-4813.patch

Patch that fixes the unit test failures and adds some testing of the new 
COMMITTING state.  As a bonus, most of the tests in TestJobImpl actually test a 
JobImpl object rather than a mock of it.


 AM timing out during job commit
 ---

 Key: MAPREDUCE-4813
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch


 The AM calls the output committer's {{commitJob}} method synchronously during 
 JobImpl state transitions, which means the JobImpl write lock is held the 
 entire time the job is being committed.  Holding the write lock prevents the 
 RM allocator thread from heartbeating to the RM.  Therefore if committing the 
 job takes too long (e.g.: the job has tons of files to commit and/or the 
 namenode is bogged down) then the AM appears to be unresponsive to the RM and 
 the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4813) AM timing out during job commit


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4813:
--

Target Version/s: 2.0.3-alpha, 0.23.6
  Status: Patch Available  (was: Open)

 AM timing out during job commit
 ---

 Key: MAPREDUCE-4813
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch


 The AM calls the output committer's {{commitJob}} method synchronously during 
 JobImpl state transitions, which means the JobImpl write lock is held the 
 entire time the job is being committed.  Holding the write lock prevents the 
 RM allocator thread from heartbeating to the RM.  Therefore if committing the 
 job takes too long (e.g.: the job has tons of files to commit and/or the 
 namenode is bogged down) then the AM appears to be unresponsive to the RM and 
 the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

2012-11-28 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505831#comment-13505831
]

Bikas Saha commented on MAPREDUCE-4819:
---

Yeah. Got the same info from Vinod in an offline conversation.

Looks like the patch solves half the problem. Making sure that history is fully
saved before changing to succeeded state.
The other half is to make sure the recovery data is available to the restarted
app.
Since the RM can restart FAILED/KILLED/SUCCEEDED apps, looks like we will need
to wait for state data to be saved for all of them and not just succeeded state
(which is what the patch does). Or else, the RM could restart a failed app
which would run to again and fail again.

The solutions to the second half could be
1) dont delete the original in staging dirs. But this suffers from a problem
that final staging dir clean up would end up cleaning it for a successful app
and then AM could crash
2) have recovery service look at both temp and done locations. But this suffers
from race conditions when the AM does a partial move to done dir and then dies.
so part of the data is on temp and part in done.
3) before moving from temp to done create a marker file in done. upon restart,
check if marker file exists. if it does then dont do anything because the job
was done (failed/killed/successful) and it died sometime after that.

AM can rerun job after reporting final job status to the client
---

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505863#comment-13505863
]

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Delink? you me remove it as sub-task?, If so, I'd like it to stay as subtask as
they are related. Thx

plugin for generic shuffle service
--

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505864#comment-13505864
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

And don't worry about begin a subtask delaying it, I'll review it as soon as 
you post a patch and committed it when ready. The same is happening with the 
other subtasks, so things should be in quite quickly. Thx

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505890#comment-13505890
 ] 

Radim Kolar commented on MAPREDUCE-4827:


i have no numbers available

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505900#comment-13505900
]

Jason Lowe commented on MAPREDUCE-4819:
---

We can't have the AM looking for the file in done_intermediate. The history
server could have moved it out of there in the interim. And I don't think we
want the AM to know how to find it's file in the final done location the
history server puts it in either. Too much coupling between those systems,
IMHO.

I think leaving it in the staging directory is the correct solution. As I
mentioned, we need to make sure we don't delete the staging directory before
unregistering with the RM. That prevents subsequent AM re-attempts right off
the bat. And deleting the staging directory before unregistering is happening
today as discussed in YARN-244, so that problem is not specific to this fix.

Leaving it in staging is straightforward. No need for extra markers, racing
with the history server, etc. And if the staging directory is gone, well the
AM can't relaunch in the first place, so no issues of re-running and
re-committing there. We could still have a discrepancy between the client
thinking the job succeeded (which it basically did re: its output data) but the
RM saying it failed, but this is fixable by moving the removal of the staging
directory to after we unregister from the RM when we fix YARN-244.

AM can rerun job after reporting final job status to the client
---

[jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client

[
https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505935#comment-13505935
]

Jason Lowe commented on MAPREDUCE-4819:
---

Took a look at the patch, and I think we are missing some critical corner
cases. For example, if we finish committing the job and the committer is using
a marker of sorts (e.g.: _SUCCESS), then we could trigger downstream jobs to
run *before* the job history is completely closed. I believe Oozie is polling
for the _SUCCESS marker, for example. If we crash after committing but before
writing the job finished record then we could end up re-committing again while
another job is attempting to consume our output, leading to potential data loss
even though both jobs would have SUCCEEDED. That's a Bad Thing.

I think the crux of the issue is that we must not commit twice. The act of
committing is what could trigger downstream jobs or in itself not be
repeatable/recoverable, so we should treat AM crashes during job commit much
like we treat non-crashing failures during job commit today, i.e.: it should
fail the job without re-running and re-committing. Worst-case we have a false
negative where the output did commit successfully but we thought the job
failed, and I agree with Koji that a false negative beats a false positive in
this case.

This means we need a marker noting when we start and stop committing sync'd to
the job history file. If the AM relaunches and finds we crashed during commit,
we should treat it as we do a committer failure and fail the job. If the
re-attempt finds we finished committing then we simply need to unregister from
the RM without re-running.

AM can rerun job after reporting final job status to the client
---

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505940#comment-13505940
 ] 

Robert Joseph Evans commented on MAPREDUCE-4827:


That is very interesting. I can see it in java.util.HashMap but it looks like 
java.util.Hashtable does not.  Assuming that Jenkins comes back with a +1 I am 
OK with putting this in.  I would like to have some numbers, because this is a 
performance improvement, but the citation of the code in HashMap.java, which 
is almost identical to this patch, is good enough for me. +1

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2374) Text File Busy errors launching MR tasks

2012-11-28 Thread Andy Isaacson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505955#comment-13505955
 ] 

Andy Isaacson commented on MAPREDUCE-2374:
--

The fix has been merged to branch-1, but unfortunately not to branch-1.1, so 
it's not included in the 1.1.1 release which is currently being voted on.

 Text File Busy errors launching MR tasks
 --

 Key: MAPREDUCE-2374
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Andy Isaacson
 Fix For: 0.23.3, 2.0.2-alpha

 Attachments: failed_taskjvmsh.strace, mapreduce-2374-2.txt, 
 mapreduce-2374-branch-1.patch, mapreduce-2374-on-20sec.txt, 
 mapreduce-2374.txt, mapreduce-2374.txt, mapreduce-2374.txt, 
 successfull_taskjvmsh.strace


 Some very small percentage of tasks fail with a Text file busy error.
 The following was the original diagnosis:
 {quote}
 Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
 class swallows all IO exceptions. We're not currently checking for errors, 
 which I'm seeing result in occasional task failures with the message Text 
 file busy - assumedly because the close() call is failing silently for some 
 reason.
 {quote}
 .. but turned out to be another issue as well (see below)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2374) Text File Busy errors launching MR tasks


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505957#comment-13505957
 ] 

Marc Reichman commented on MAPREDUCE-2374:
--

Andy,

Thank you for your comment. I apologize for my lack of understanding of the 
hadoop release process, but does this mean the fix will be included in a future 
1.0.5 release?

 Text File Busy errors launching MR tasks
 --

 Key: MAPREDUCE-2374
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Andy Isaacson
 Fix For: 0.23.3, 2.0.2-alpha

 Attachments: failed_taskjvmsh.strace, mapreduce-2374-2.txt, 
 mapreduce-2374-branch-1.patch, mapreduce-2374-on-20sec.txt, 
 mapreduce-2374.txt, mapreduce-2374.txt, mapreduce-2374.txt, 
 successfull_taskjvmsh.strace


 Some very small percentage of tasks fail with a Text file busy error.
 The following was the original diagnosis:
 {quote}
 Our use of PrintWriter in TaskController.writeCommand is unsafe, since that 
 class swallows all IO exceptions. We're not currently checking for errors, 
 which I'm seeing result in occasional task failures with the message Text 
 file busy - assumedly because the close() call is failing silently for some 
 reason.
 {quote}
 .. but turned out to be another issue as well (see below)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: COMBO-mapreduce-4809-4807.patch
mapreduce-4807.patch

Hi Arun,
  I addressed the following issues:

*Copied fields from {{Context}} to local copies to reduce the size of the patch.
*Opted to change the method name to {{createSortingCollector().}}  I cannot use 
this to create {{DirectMapOutputCollector()}} (based on whether it is a 
map-only job) since the call to this method from {{NewOutputCollector}} always 
expects a sorting collector.
*Prefixed *get* in the method signatures of {{Context}} class.

Please review the uploaded patch.

Thanks.

-- Asokan


 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-11-28 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505979#comment-13505979
 ] 

Doug Cutting commented on MAPREDUCE-4827:
-

This is an incompatible change; it will change the output of jobs.  In most 
cases this shouldn't matter, but there might be applications which expect, 
e.g., the key '1' to go to the output file numbered '1'.  This could be avoided 
by, instead of modifying HashPartitioner, adding a new partitioner.

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4809) Make classes required for MAPREDUCE-2454 to be java public (with LimitedPrivate)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505980#comment-13505980
 ] 

Mariappan Asokan commented on MAPREDUCE-4809:
-

Hi Arun and Alejandro,
  Thanks for all your help in making this happen.

-- Asokan


 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)
 

 Key: MAPREDUCE-4809
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4809
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: MR-2454

 Attachments: MAPREDUCE-4809-1.patch, mapreduce-4809.patch, 
 mapreduce-4809.patch, mapreduce-4809.patch


 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Status: Patch Available  (was: Open)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505991#comment-13505991
 ] 

Radim Kolar commented on MAPREDUCE-4827:


If applications requires stable partitioning, then it needs to provide own 
partitioner because hashCode() for Object is not same across JVMs. No need to 
push backward compatibility that hard. I never seen such app and we have about 
2 mils lines of mapred stuff.

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4828) Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 1.7.x

Amir Sanjar created MAPREDUCE-4828:
--

 Summary: Unit Test: TestTaskTrackerLocalization fails when ran 
with ant-1.8.4 and not 1.7.x
 Key: MAPREDUCE-4828
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4828
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
 Environment: Fedora 17  RHEL 6.3, x86_64, IBM JAVA 7
Reporter: Amir Sanjar
Priority: Critical
 Fix For: 1.1.0


Problem is caused by JUnit3 based testcases ran in Junit4 environment 
configured by ant 1.8.4..
in this case @Ignore tag is not getting ignored. 
This testcase has been removed from trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505996#comment-13505996
 ] 

Mariappan Asokan commented on MAPREDUCE-4807:
-

Sorry for the botched formatting:(  Here we go.

Hi Arun,
I addressed the following issues:

* Copied fields from {{Context}} to local copies to reduce the size of the 
patch.
* Opted to change the method name to {{createSortingCollector().}} I cannot use 
this to create {{DirectMapOutputCollector()}} (based on whether it is a 
map-only job) since the call to this method from {{NewOutputCollector}} always 
expects a sorting collector.
* Prefixed *get* in the method signatures of {{Context}} class.

Please review the uploaded patch.

Thanks.

– Asokan


 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4829) Unit Test: TestMiniMRMapRedDebugScript fails when ran with ant-1.8.4 and not 1.7.x

Amir Sanjar created MAPREDUCE-4829:
--

 Summary: Unit Test: TestMiniMRMapRedDebugScript fails when ran 
with ant-1.8.4 and not 1.7.x 
 Key: MAPREDUCE-4829
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4829
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
 Environment: Fedora 17  RHEL 6.3, x86_64, IBM JAVA 7
Reporter: Amir Sanjar
Priority: Critical
 Fix For: 1.1.0


Problem is caused by JUnit3 based testcases ran in Junit4 environment 
configured by ant 1.8.4..
in this case @Ignore tag is not getting ignored. 
This testcase has been removed from trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized

2012-11-28 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-2058:
-

Affects Version/s: 1.0.4

 FairScheduler:NullPointerException in web interface when JobTracker not 
 initialized
 ---

 Key: MAPREDUCE-2058
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.22.0, 1.0.4
Reporter: Dan Adkins
 Attachments: MAPREDUCE-2058.patch


 When I contact the jobtracker web interface prior to the job tracker being 
 fully initialized (say, if hdfs is still in safe mode), I get the following 
 error:
 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp
 java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909)
 at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357)
 at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334)
 at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)   
  at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4828) Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 1.7.x


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4828:
---

Attachment: MAPREDUCE-4828-release-1.1.0.patch

 Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 
 1.7.x
 --

 Key: MAPREDUCE-4828
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4828
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
 Environment: Fedora 17  RHEL 6.3, x86_64, IBM JAVA 7
Reporter: Amir Sanjar
Priority: Critical
 Fix For: 1.1.0

 Attachments: MAPREDUCE-4828-release-1.1.0.patch


 Problem is caused by JUnit3 based testcases ran in Junit4 environment 
 configured by ant 1.8.4..
 in this case @Ignore tag is not getting ignored. 
 This testcase has been removed from trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4828) Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 1.7.x


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4828:
---

Attachment: MAPREDUCE-4828-branch1.patch

 Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 
 1.7.x
 --

 Key: MAPREDUCE-4828
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4828
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
 Environment: Fedora 17  RHEL 6.3, x86_64, IBM JAVA 7
Reporter: Amir Sanjar
Priority: Critical
 Fix For: 1.1.0

 Attachments: MAPREDUCE-4828-branch1.patch, 
 MAPREDUCE-4828-release-1.1.0.patch


 Problem is caused by JUnit3 based testcases ran in Junit4 environment 
 configured by ant 1.8.4..
 in this case @Ignore tag is not getting ignored. 
 This testcase has been removed from trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4828) Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 1.7.x


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506012#comment-13506012
 ] 

Amir Sanjar commented on MAPREDUCE-4828:


this failure has been seen in multiple f17  rehel 6.3 hadoop development 
environments.  

 Unit Test: TestTaskTrackerLocalization fails when ran with ant-1.8.4 and not 
 1.7.x
 --

 Key: MAPREDUCE-4828
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4828
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
 Environment: Fedora 17  RHEL 6.3, x86_64, IBM JAVA 7
Reporter: Amir Sanjar
Priority: Critical
 Fix For: 1.1.0

 Attachments: MAPREDUCE-4828-branch1.patch, 
 MAPREDUCE-4828-release-1.1.0.patch


 Problem is caused by JUnit3 based testcases ran in Junit4 environment 
 configured by ant 1.8.4..
 in this case @Ignore tag is not getting ignored. 
 This testcase has been removed from trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized

2012-11-28 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-2058:
-

Attachment: MAPREDUCE-2058-branch-1.patch

web threads have to be synchronized with the initialization otherwise there is 
no proper happens-before.

 FairScheduler:NullPointerException in web interface when JobTracker not 
 initialized
 ---

 Key: MAPREDUCE-2058
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.22.0, 1.0.4
Reporter: Dan Adkins
 Attachments: MAPREDUCE-2058-branch-1.patch, MAPREDUCE-2058.patch


 When I contact the jobtracker web interface prior to the job tracker being 
 fully initialized (say, if hdfs is still in safe mode), I get the following 
 error:
 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp
 java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909)
 at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357)
 at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334)
 at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)   
  at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2374) Text File Busy errors launching MR tasks

2012-11-28 Thread Matt Foley (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506022#comment-13506022
]

Matt Foley commented on MAPREDUCE-2374:
---

Marc, being in branch-1, it will be in 1.2.0 when we make that release in
December.
Andy, please go ahead and commit it to branch-1.1 also, so it will be in 1.1.2
when that patch release is made.
Marc, you can request it be committed to branch-1.0 also, but at this time
there are no plans to produce a 1.0.5 release. Are you able to move to 1.1.1
instead? 1.1.1 passed vote yesterday, and I will have it published and
announced in the next day or two.

Text File Busy errors launching MR tasks
--

Key: MAPREDUCE-2374
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Andy Isaacson
Fix For: 0.23.3, 2.0.2-alpha

Attachments: failed_taskjvmsh.strace, mapreduce-2374-2.txt,
mapreduce-2374-branch-1.patch, mapreduce-2374-on-20sec.txt,
mapreduce-2374.txt, mapreduce-2374.txt, mapreduce-2374.txt,
successfull_taskjvmsh.strace

Some very small percentage of tasks fail with a Text file busy error.
The following was the original diagnosis:
{quote}
Our use of PrintWriter in TaskController.writeCommand is unsafe, since that
class swallows all IO exceptions. We're not currently checking for errors,
which I'm seeing result in occasional task failures with the message Text
file busy - assumedly because the close() call is failing silently for some
reason.
{quote}
.. but turned out to be another issue as well (see below)

[jira] [Commented] (MAPREDUCE-2374) Text File Busy errors launching MR tasks

[
https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506026#comment-13506026
]

Marc Reichman commented on MAPREDUCE-2374:
--

Matt,

Thank you for your response. I will be able to move to 1.1.x. I was hoping to
not have to move to 2.x soon. Does 1.1 move to stable when 1.2 gets released
(beta?) in December?

I apologize for the improper forum for these questions.

Thanks,
Marc

Text File Busy errors launching MR tasks
--

[jira] [Commented] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506028#comment-13506028
 ] 

Hadoop QA commented on MAPREDUCE-2058:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12555264/MAPREDUCE-2058-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3076//console

This message is automatically generated.

 FairScheduler:NullPointerException in web interface when JobTracker not 
 initialized
 ---

 Key: MAPREDUCE-2058
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.22.0, 1.0.4
Reporter: Dan Adkins
 Attachments: MAPREDUCE-2058-branch-1.patch, MAPREDUCE-2058.patch


 When I contact the jobtracker web interface prior to the job tracker being 
 fully initialized (say, if hdfs is still in safe mode), I get the following 
 error:
 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp
 java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909)
 at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357)
 at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334)
 at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)   
  at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-11-28 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506032#comment-13506032
 ] 

Doug Cutting commented on MAPREDUCE-4827:
-

Integer#hashCode() is documented to be the integer value.

http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#hashCode()

Similarly, the hashCode() implelementations for String, Double, Float, Long, 
etc. are specified and do not change from one JVM to another.

Also, I didn't veto this change.  I just observed that it was not 
back-compatible.  That should be taken into account if/when it is committed.

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506050#comment-13506050
 ] 

Hadoop QA commented on MAPREDUCE-4807:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12555252/COMBO-mapreduce-4809-4807.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3075//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3075//console

This message is automatically generated.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: (was: mapreduce-4807.patch)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: mapreduce-4807.patch

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506052#comment-13506052
 ] 

Hadoop QA commented on MAPREDUCE-4807:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555268/mapreduce-4807.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3077//console

This message is automatically generated.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506054#comment-13506054
 ] 

Radim Kolar commented on MAPREDUCE-4827:


this one is platform dependent and more or less random. Most writables do not 
implement hashCode()

http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506059#comment-13506059
 ] 

Arun C Murthy commented on MAPREDUCE-4049:
--

Alejandro - there seems to be some lingering history between the protagonists 
here and in MAPREDUCE-2454.

There is no point trying to force each upon the other.

Since it's different people working on it (who don't the same horizon) let's 
de-link them and take off ferrets, ok?

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506061#comment-13506061
]

Arun C Murthy commented on MAPREDUCE-4049:
--

Avner, I can't seem to make you a 'contributor' and assign this jira to you.
Some weird issue with JIRA, fyi.

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: trunk

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4049:
-

Issue Type: Improvement  (was: Sub-task)
Parent: (was: MAPREDUCE-2454)

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-11-28 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506064#comment-13506064
 ] 

Doug Cutting commented on MAPREDUCE-4827:
-

 Most writables do not implement hashCode()

All WritableComparable (i.e., key) implementations included with Hadoop 
implement hashCode().  Moreover a WritableComparable would be a poor key 
implementation if it did not implement hashCode() and was used with 
HashPartitioner since it wouldn't send equivalent values at the same reducer.  
The WritableComparable documentation specifically advises implementing 
hashCode().

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html

 Increase hash quality of HashPartitioner
 

 Key: MAPREDUCE-4827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Radim Kolar
 Attachments: betterhash1.txt


 hash partitioner is using object.hashCode() for splitting keys into 
 partitions. This results in bad distributions because hashCode() quality is 
 poor. 
 These hashCode() functions are sometimes written by hand (very poor quality) 
 and sometimes generated from by commons lang code (poor quality). Applying 
 some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4049) plugin for generic shuffle service


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-4049:
-

Assignee: Avner BenHanoch

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506069#comment-13506069
]

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Arun, as I said before, the works is related thus it should be done together.
If there was some lingering history this seems to be in past because now
there seems to be a full synergy between the work done in the different JIRAs.
We are community, we have disagreements and we address them, this is how we
suppose to work.

Avner, just sorted out the JIRA glitch, and assigned the JIRA to you.

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: trunk

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../

2012-11-28 Thread Priyo Mustafi (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506107#comment-13506107
 ] 

Priyo Mustafi commented on MAPREDUCE-3772:
--

MultipleOutputs exposes to methods.
  1) public K,V void write(String namedOutput,K key,V value)
  2) public K,V void write(String namedOutput,K key,V value,String 
baseOutputPath)
where
  namedOutput - the named output name
  baseOutputPath - base-output path to write the record to. Note: Framework 
will generate unique filename for the baseOutputPath 
  
We use the second one which allows you to provide a baseOutputPath where the 
data needs to be written.  I don't see anywhere in the javadoc which mentions 
that baseOutputPath shouldn't be a fully qualified path.  So the Jira is 
definitely valid.  Either the Javadoc needs to be fixed or the code needs to be 
fixed and I would prefer the latter as we have developed extensive 
data-pipelines based on this.  If it is not fixed, we have to change the 
absolute paths to sub-directory paths and then once the job is done, move all 
those directories out to the expected locations.

Aside that, if we provide baseOutputPath as abc/def/xyz then it puts the 
directory under the main output directory i.e. you get files like this  
main-output-dir/abc/def/xyz-r-0.   Instead if you use baseOutputPath as 
/abc/def/xyz where the path isn't a subdirectory of the main output 
directory, then the problem is seen.  




 MultipleOutputs output lost if baseOutputPath starts with ../
 -

 Key: MAPREDUCE-3772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.203.0, 0.22.0
 Environment: FreeBSD
Reporter: Radim Kolar

 Lets say you have output directory set:
 FileOutputFormat.setOutputPath(job, /tmp/multi1/out);
 and want to place output from MultipleOutputs into /tmp/multi1/extra
 I expect following code to work:
 mos = new MultipleOutputsText, IntWritable(context);
 mos.write(new Text(zrr), value, ../extra/);
 but no Exception is throw and expected output directory /tmp/multi1/extra 
 does not even exists. All data written to this output vanish without trace.
 To make it work fullpath must be used
 mos.write(new Text(zrr), value, /tmp/multi1/extra/);
 Output is listed in statistics from MultipleOutputs correctly:
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
 ../gaja1/=1 (* everything is lost *)
 /tmp/multi1/out/../ksd34/=1 (* this using full path works 
 *)
 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4809) Make classes required for MAPREDUCE-2454 to be java public (with LimitedPrivate)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506115#comment-13506115
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4809:
---

BTW, I've had to revert and recommit the patch as it was incorrect. I had to do 
this twice as the first time I had some stuff uncommitted. Not it should be OK.

 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)
 

 Key: MAPREDUCE-4809
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4809
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: MR-2454

 Attachments: MAPREDUCE-4809-1.patch, mapreduce-4809.patch, 
 mapreduce-4809.patch, mapreduce-4809.patch


 Make classes required for MAPREDUCE-2454 to be java public (with 
 LimitedPrivate)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506069#comment-13506069
]

Alejandro Abdelnur edited comment on MAPREDUCE-4049 at 11/29/12 1:17 AM:
-

Arun, as I said before, the works is related thus it should be done together.
If there was some lingering history, this seems to be in the past because now
there is full synergy between the work done in the different JIRAs. We are s
community, we have disagreements and we address them, this is how we suppose to
work.

Avner, just sorted out the JIRA glitch, and assigned the JIRA to you.

was (Author: tucu00):
Arun, as I said before, the works is related thus it should be done
together. If there was some lingering history this seems to be in past
because now there seems to be a full synergy between the work done in the
different JIRAs. We are community, we have disagreements and we address them,
this is how we suppose to work.

Avner, just sorted out the JIRA glitch, and assigned the JIRA to you.

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: trunk

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch,
mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506121#comment-13506121
 ] 

Radim Kolar commented on MAPREDUCE-3772:


If you have budget, i can fix it for you.

 MultipleOutputs output lost if baseOutputPath starts with ../
 -

 Key: MAPREDUCE-3772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.203.0, 0.22.0
 Environment: FreeBSD
Reporter: Radim Kolar

 Lets say you have output directory set:
 FileOutputFormat.setOutputPath(job, /tmp/multi1/out);
 and want to place output from MultipleOutputs into /tmp/multi1/extra
 I expect following code to work:
 mos = new MultipleOutputsText, IntWritable(context);
 mos.write(new Text(zrr), value, ../extra/);
 but no Exception is throw and expected output directory /tmp/multi1/extra 
 does not even exists. All data written to this output vanish without trace.
 To make it work fullpath must be used
 mos.write(new Text(zrr), value, /tmp/multi1/extra/);
 Output is listed in statistics from MultipleOutputs correctly:
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
 ../gaja1/=1 (* everything is lost *)
 /tmp/multi1/out/../ksd34/=1 (* this using full path works 
 *)
 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506123#comment-13506123
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3772:
---

I think that, at least, javadocs should be updated to reflect that if you want 
to use speculative execution the baseOutputPath must not be a path but a name. 
I would prefer to do enforce it, as IMO it is a bug it is not enforced.



 MultipleOutputs output lost if baseOutputPath starts with ../
 -

 Key: MAPREDUCE-3772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.203.0, 0.22.0
 Environment: FreeBSD
Reporter: Radim Kolar

 Lets say you have output directory set:
 FileOutputFormat.setOutputPath(job, /tmp/multi1/out);
 and want to place output from MultipleOutputs into /tmp/multi1/extra
 I expect following code to work:
 mos = new MultipleOutputsText, IntWritable(context);
 mos.write(new Text(zrr), value, ../extra/);
 but no Exception is throw and expected output directory /tmp/multi1/extra 
 does not even exists. All data written to this output vanish without trace.
 To make it work fullpath must be used
 mos.write(new Text(zrr), value, /tmp/multi1/extra/);
 Output is listed in statistics from MultipleOutputs correctly:
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
 ../gaja1/=1 (* everything is lost *)
 /tmp/multi1/out/../ksd34/=1 (* this using full path works 
 *)
 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Status: Open  (was: Patch Available)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: (was: COMBO-mapreduce-4809-4807.patch)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Status: Patch Available  (was: Open)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: (was: mapreduce-4807.patch)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: mapreduce-4807.patch
COMBO-mapreduce-4809-4807.patch

Sorry about the confusion.  QA picked up the incremental patch as well.  
Resubmitting patch files together.

-- Asokan

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506164#comment-13506164
 ] 

Hadoop QA commented on MAPREDUCE-4807:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555297/mapreduce-4807.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3078//console

This message is automatically generated.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: COMBO-mapreduce-4809-4807.patch

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Attachment: (was: COMBO-mapreduce-4809-4807.patch)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4807:


Status: Open  (was: Patch Available)

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, 
 COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable