[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2016-09-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507409#comment-15507409
 ] 

Varun Saxena commented on MAPREDUCE-5507:
-

I was initially thinking of having a configuration to ramp up reducers if maps 
are hanging for a while but as per discussion on MAPREDUCE-6689, this may lead 
to suboptimal job performance at it will be very hard to decide a right 
configuration value for this.

We haven't encountered any job hang issues in our deployments since 
MAPREDUCE-6513, MAPREDUCE-6514 has gone in our branch.
So I am fine with closing it. Maybe we can check with defect reporter too. cc 
[~ojoshi].



> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>Priority: Critical
> Attachments: MAPREDUCE-5507.20130922.1.patch
>
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2016-08-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431740#comment-15431740
 ] 

Karthik Kambatla commented on MAPREDUCE-5507:
-

[~varun_saxena] - this appears to be very similar to issues fixed in 
MAPREDUCE-6513 and MAPREDUCE-6514. Can this be closed as a duplicate? 

> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>Priority: Critical
> Attachments: MAPREDUCE-5507.20130922.1.patch
>
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2015-10-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960489#comment-14960489
 ] 

Rohith Sharma K S commented on MAPREDUCE-5507:
--

We are hitting this issue frequently causing job to hang forever, any update on 
this issue?

> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>Priority: Critical
> Attachments: MAPREDUCE-5507.20130922.1.patch
>
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2013-09-22 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774235#comment-13774235
 ] 

Omkar Vinit Joshi commented on MAPREDUCE-5507:
--

attaching a very basic patch.. tested it locally on my machine.
* When the cluster gets saturated it will start preempting reducers after 
waiting for map task. 
* Right now I am using fix interval of 2 min but this will be updated with a 
min of multiple of hearbeat intervals or avg map task finish time.

Please let me know if the approach taken is correct.

> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: MAPREDUCE-5507.20130922.1.patch
>
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2013-09-19 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772479#comment-13772479
 ] 

Omkar Vinit Joshi commented on MAPREDUCE-5507:
--

also there looks to be a problem with below code. You can either preempt 
reducer or schedule new but not both at the same time  any thoughts? 
Planning to fix this as a part of this

{code}
if (recalculateReduceSchedule) {
  preemptReducesIfNeeded();
  scheduleReduces(
  getJob().getTotalMaps(), completedMaps,
  scheduledRequests.maps.size(), scheduledRequests.reduces.size(), 
  assignedRequests.maps.size(), assignedRequests.reduces.size(),
  mapResourceReqt, reduceResourceReqt,
  pendingReduces.size(), 
  maxReduceRampupLimit, reduceSlowStart);
  recalculateReduceSchedule = false;
}
{code}

> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues

2013-09-19 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772473#comment-13772473
 ] 

Omkar Vinit Joshi commented on MAPREDUCE-5507:
--

Potential problem I see here is that reducer preemption logic is mainly 
dependent on headroom (available resources) returned by RM. After discussing 
with [~vinodkv] and [~sseth] offline.. There are certain important points we 
need to take care of
* If we ever hit the situation where I have 
assignedMaps=0,assignedReducers>0,scheduledMaps>0,scheduledRed>=0...then 
** I should wait for some time..
*** we are proposing time to be min[ (some percentage of average map reduce 
task completion time) , (some configurable number * AM-RM heartbeat interval) ]
** if we don't get any new container for map task during above interval then we 
will follow 
*** first remove all the scheduled reducer requests as done today in 
RMContainerAllocator#preemptReducesIfNeeded()
*** remove as many reducers as required to allocate a single map task.
** We should keep doing above steps repeatedly after above interval of time if 
we don't get any new map task. Also we should avoid ramping up later and cap 
the reducer count to the current running reducers as there is no point in 
requesting and canceling later the reducer requests/ killing running reducers 
in future (As we already using up to the capacity of the running user).



> MapReduce reducer ramp down is suboptimal with potential job-hanging issues
> ---
>
> Key: MAPREDUCE-5507
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and 
> "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched 
> more aggressively. However the calculation to either Ramp up or Ramp down 
> reducer is not done in most optimal way. 
> * If MR AM at any point sees situation something like 
> ** scheduledMaps : 30
> ** scheduledReducers : 10
> ** assignedMaps : 0
> ** assignedReducers : 11
> ** finishedMaps : 120
> ** headroom : 756 ( when your map /reduce task needs only 512mb)
> * then today it simply hangs because it thinks that there is sufficient room 
> to launch one more mapper and therefore there is no need to ramp down. 
> However, if this continues forever then this is not the correct way / optimal 
> way.
> * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
> and there are running reducers around then it should wait for certain time ( 
> upper limited by average map task completion time ... for heuristic 
> sake)..but after that if still it doesn't get new container for map task then 
> it should preempt the reducer one by one with some interval and should ramp 
> up slowly...
> ** Preemption of reducers can be done in little smarter way
> *** preempt reducer on a node manager for which there is any pending map 
> request.
> *** otherwise preempt any other reducer. MR AM will contribute to getting new 
> mapper by releasing such a reducer / container because it will reduce its 
> cluster consumption and thereby may become candidate for an allocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira