[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-09-02 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727761#comment-14727761
 ] 

Srikanth Kandula commented on YARN-4088:


True. a) Not sure if this (out-of-band heartbeat upon container completion) 
happens today. b) Processing one NM at a time is unlikely to cope well with the 
storms of heartbeats.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-09-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728158#comment-14728158
 ] 

Jason Lowe commented on YARN-4088:
--

bq. Not sure if this (out-of-band heartbeat upon container completion) happens 
today
The OOB heartbeat is occurring to a degree if the cluster is running a lot of 
MapReduce.  Currently the MapReduce AM will proactively kill any task that 
reports a terminal status over the umbilical protocol.  There's then a race 
between the task container completing on its own and the AM killing the task 
via the NM.  If the latter wins the race then we get an OOB heartbeat since 
today a stop container request generates it.  I see the kill winning the race 
fairly often on our clusters, so we are getting a lot of OOB heartbeats in 
practice.

General OOB heartbeats on any type of container completion does not occur today 
but is proposed by YARN-2046.

bq.  Processing one NM at a time is unlikely to cope well with the storms of 
heartbeats.
This was a big problem in the past and the scheduler could fall far behind, but 
this was mitigated to a large degree with batching of heartbeats (e.g.: 
YARN-365).

Agree in general though that allowing the scheduler to be more concurrent would 
be nice.  

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718597#comment-14718597
 ] 

Jason Lowe commented on YARN-4088:
--

bq. See the problem with slower heartbeats is that if the tasks are 
short-running, there will be a cluster-wide throughput drop due to the feedback 
delay.
The nodemanager will do an out-of-band heartbeat if a container is killed, and 
IMHO should do the same when a container completes (not sure what's so special 
about killed vs. exiting wrt. scheduling).  Of course you can still get storms 
of heartbeats even though you explicitly tuned down the heartbeat interval if 
the cluster is churning containers at a very fast rate.


 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717746#comment-14717746
 ] 

Bikas Saha commented on YARN-4088:
--

Is the suggestion to process them in concurrently? Not quite sure what async 
means here? Is it async wrt the RPC thread?
Another alternative would be to dynamically adjust the NM heartbeat interval. 
IIRC, the NM next heartbeat interval is sent by the RM in the response to the 
heartbeat. If not, then this could be added. The RM could potentially increase 
this interval till it reaches a steady/stable state of heartbeat processing. 
This would help in self-adjusting to cluster sizes. Small for small cluster and 
high for high cluster. This could tune up under high load and then tune down 
once load diminishes.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717887#comment-14717887
 ] 

Srikanth Kandula commented on YARN-4088:


See the problem with slower heartbeats is that if the tasks are short-running, 
there will be a cluster-wide throughput drop due to the feedback delay. This is 
one of the points that Sparrow (Spark) and Mercury hammer Yarn on... Of course, 
reusing containers *can* help but other ducks have to align well.  In general, 
slowing the heartbeat is not a good thing.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717907#comment-14717907
 ] 

Bikas Saha commented on YARN-4088:
--

Right. So the combined objective is to continue to have small heartbeat 
intervals with larger clusters while still using the central scheduler for all 
allocations. Clearly, in theory, that is a bottleneck by design and our attempt 
is to engineer our way out of it for medium size clusters. Right? :)

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717831#comment-14717831
 ] 

Bikas Saha commented on YARN-4088:
--

Why not on a 3K cluster? We could slowdown heartbeats to (say 10s) on a 3K node 
cluster. That should work though I agree that NM info would be stale for 
longer, if that's your point.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717798#comment-14717798
 ] 

Srikanth Kandula commented on YARN-4088:


Yes, concurrently.   Your suggestion is a good one. In that, it does give the 
RM more time to be clever on small clusters. But, no such luck on say a 3K 
server cluster. Avoiding serialization may be the answer to most other problems.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)