[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2016-03-19 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198204#comment-15198204
 ] 

Srikanth Kandula commented on YARN-2965:


Go for it :-)  We have some dummy code that was good enough to get numbers and 
experiments but are not actively working on pushing that in. Inigo, i will 
share that code with you offline so you can pick any useful pieces if you like 
from that.

> Enhance Node Managers to monitor and report the resource usage on machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4088) RM should be able to process heartbeats from NM concurrently

2015-09-09 Thread Srikanth Kandula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Kandula updated YARN-4088:
---
Summary: RM should be able to process heartbeats from NM concurrently  
(was: RM should be able to process heartbeats from NM asynchronously)

> RM should be able to process heartbeats from NM concurrently
> 
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-09-02 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727761#comment-14727761
 ] 

Srikanth Kandula commented on YARN-4088:


True. a) Not sure if this (out-of-band heartbeat upon container completion) 
happens today. b) Processing one NM at a time is unlikely to cope well with the 
storms of heartbeats.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717267#comment-14717267
 ] 

Srikanth Kandula commented on YARN-4056:


I looked. Sort of similar but not really. The similarity is that both allow 
multiple containers to be allocated within fewer calls. 

The difference is in the policies and the complexity. Bundling allows any 
arbitrary subset of 'legit' tasks to be assigned. Whereas assignMultiple simply 
assigns the first few. For example, bundling can decide that the 2nd, 3rd and 
10th tasks are a good choice in contrast to assigning just the 1st task (the 
others may not fit). assignMultiple does not allow for this.

Bundling is slightly more complex because the actual assignment is deferred 
till the loop finishes. Whereas assignMultiple assigns each task in place and 
keeps going.

Patch is with [~chris.douglas] for an internal review.

We are pushing out a bundler that mimics the current scheduler. All the tests 
pass and there is no performance change. As expected. Note however that the 
allocations are still deferred.

Better bundlers are in the works.

 Bundling: Searching for multiple containers in a single pass over {queues, 
 applications, priorities}
 

 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
Assignee: Robert Grandl
 Attachments: bundling.docx


 More than one container is allocated on many NM heartbeats. Yet, the current 
 scheduler allocates exactly one container per iteration over {{queues, 
 applications, priorities}}. When there are many queues, applications, or 
 priorities allocating only one container per iteration can  needlessly 
 increase the duration of the NM heartbeat.
  
 In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
 to be allocated in a single iteration over {{queues, applications and 
 priorities}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717294#comment-14717294
 ] 

Srikanth Kandula commented on YARN-4081:


Ease of expression is a great thing to have. So also is extending to multiple 
resources. That is all cool.

I am mostly worried about the performance impact of replacing a small 
datastructure that has native types with a much larger datastructure that has 
user-defined types.  Could you run a profile?  How much more space would a 
resource object take up now? How much more time would it take to initialize and 
garbage collect 10K resource objects?

 Add support for multiple resource types in the Resource class
 -

 Key: YARN-4081
 URL: https://issues.apache.org/jira/browse/YARN-4081
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-4081-YARN-3926.001.patch


 For adding support for multiple resource types, we need to add support for 
 this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717317#comment-14717317
 ] 

Srikanth Kandula commented on YARN-1012:


Ack. Will do.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Arun C Murthy
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
 YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
 YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
 YARN-1012-9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717887#comment-14717887
 ] 

Srikanth Kandula commented on YARN-4088:


See the problem with slower heartbeats is that if the tasks are short-running, 
there will be a cluster-wide throughput drop due to the feedback delay. This is 
one of the points that Sparrow (Spark) and Mercury hammer Yarn on... Of course, 
reusing containers *can* help but other ducks have to align well.  In general, 
slowing the heartbeat is not a good thing.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717798#comment-14717798
 ] 

Srikanth Kandula commented on YARN-4088:


Yes, concurrently.   Your suggestion is a good one. In that, it does give the 
RM more time to be clever on small clusters. But, no such luck on say a 3K 
server cluster. Avoiding serialization may be the answer to most other problems.

 RM should be able to process heartbeats from NM asynchronously
 --

 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

 Today, the RM sequentially processes one heartbeat after another. 
 Imagine a 3000 server cluster with each server heart-beating every 3s. This 
 gives the RM 1ms on average to process each NM heartbeat. That is tough.
 It is true that there are several underlying datastructures that will be 
 touched during heartbeat processing. So, it is non-trivial to parallelize the 
 NM heartbeat. Yet, it is quite doable...
 Parallelizing the NM heartbeat would substantially improve the scalability of 
 the RM, allowing it to either 
 a) run larger clusters or 
 b) support faster heartbeats or dynamic scaling of heartbeats
 c) take more asks from each application or 
 c) use cleverer/ more expensive algorithms such as node labels or better 
 packing or ...
 Indeed the RM's scalability limit has been cited as the motivating reason for 
 a variety of efforts which will become less needed if this can be solved. 
 Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
 Can we take a shot at this?
 If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716019#comment-14716019
 ] 

Srikanth Kandula commented on YARN-2745:


Just a brief update on this JIRA... 

1) [~chris.douglas] pushed through collection of network and disk usages to 
Hadoop common. See Hadoop 12210. 

2) [~elgoiri] and [~kasha] in Yarn 3534 and Yarn 3980 collecting cpu and memory 
info of containers, push that information from the NM to the RM and make it 
available to the scheduler.

3) Packing requires the scheduler to look past the first schedulable task 
discovered by the capacity scheduler loop. Based on the feedback above, we have 
decoupled the architectural change needed from the actual packing policy. See 
Yarn 4056, called bundling. Many different packing policies are allowed in the 
bundle.

4) These changes are complementary and orthogonal to Yarn-1011. That JIRA 
recommends, rightly, to adapt RM allocation based on dynamic resource usage of 
the allocated containers. This JIRA is more about packing containers. It 
currently does so based on expected resource usages as indicated in the ask. 
Indeed, packing based on dynamic usage information would be strictly better and 
is left for future work.

 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716023#comment-14716023
 ] 

Srikanth Kandula commented on YARN-2745:


[~aw] Done by [~chris.douglas]!

 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716021#comment-14716021
 ] 

Srikanth Kandula commented on YARN-2745:


[~vinodkv] Thanks for the related. The efforts are complementary. Indeed, 
adapting assignment based on the dynamic usage would be a good thing to have. 
This JIRA is more about packing based on anticipated usages as indicated by the 
ask. Dynamic packing would be even better.


 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716029#comment-14716029
 ] 

Srikanth Kandula commented on YARN-1012:


[~elgoiri], [~kasha] Could you comment on whether this should go into hadoop 
common. Also, it may be worthwhile to extend this to also account for network 
and disk usages of the containers... See Hadoop 12210.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Arun C Murthy
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
 YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
 YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
 YARN-1012-9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716034#comment-14716034
 ] 

Srikanth Kandula commented on YARN-1011:


This is a great idea. Is there an ETA for this? Could you comment on whether it 
is being deprioritized for some reason?

 [Umbrella] RM should dynamically schedule containers based on utilization of 
 currently allocated containers
 ---

 Key: YARN-1011
 URL: https://issues.apache.org/jira/browse/YARN-1011
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy

 Currently RM allocates containers and assumes resources allocated are 
 utilized.
 RM can, and should, get to a point where it measures utilization of allocated 
 containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-26 Thread Srikanth Kandula (JIRA)
Srikanth Kandula created YARN-4088:
--

 Summary: RM should be able to process heartbeats from NM 
asynchronously
 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula


Today, the RM sequentially processes one heartbeat after another. 

Imagine a 3000 server cluster with each server heart-beating every 3s. This 
gives the RM 1ms on average to process each NM heartbeat. That is tough.

It is true that there are several underlying datastructures that will be 
touched during heartbeat processing. So, it is non-trivial to parallelize the 
NM heartbeat. Yet, it is quite doable...

Parallelizing the NM heartbeat would substantially improve the scalability of 
the RM, allowing it to either 
a) run larger clusters or 
b) support faster heartbeats or dynamic scaling of heartbeats
c) take more asks from each application or 
c) use cleverer/ more expensive algorithms such as node labels or better 
packing or ...

Indeed the RM's scalability limit has been cited as the motivating reason for a 
variety of efforts which will become less needed if this can be solved. Ditto 
for slow heartbeats.  See Sparrow and Mercury papers for example.

Can we take a shot at this?
If not, could we discuss why.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715991#comment-14715991
 ] 

Srikanth Kandula commented on YARN-4081:


Extending to multiple resources is great, but why use a Map? Is there a rough 
idea how many different resources one may want to encode? It seems an overkill 
to incur so much additional overhead if say all that is needed is a handful of 
more resources. Ditto for encapsulating strings in URIs and the 
ResourceInformation wrapper over doubles. It would perhaps have been okay if 
this datastructure was less often used but if i understand correctly, Resources 
is created/destroyed at least once per ask/ assignment and often many more 
times...

 Add support for multiple resource types in the Resource class
 -

 Key: YARN-4081
 URL: https://issues.apache.org/jira/browse/YARN-4081
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-4081-YARN-3926.001.patch


 For adding support for multiple resource types, we need to add support for 
 this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716031#comment-14716031
 ] 

Srikanth Kandula commented on YARN-3534:


[~elgoiri], [~kasha], could you comment on extending this to also take in 
network and disk usage information?

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
 YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, 
 YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, 
 YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
 YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
 YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716032#comment-14716032
 ] 

Srikanth Kandula commented on YARN-3980:


+1 this would be very useful to have... Will enable even better packing.

 Plumb resource-utilization info in node heartbeat through to the scheduler
 --

 Key: YARN-3980
 URL: https://issues.apache.org/jira/browse/YARN-3980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Inigo Goiri
 Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
 YARN-3980-v2.patch


 YARN-1012 and YARN-3534 collect resource utilization information for all 
 containers and the node respectively and send it to the RM on node heartbeat. 
 We should plumb it through to the scheduler so the scheduler can make use of 
 it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers

2015-08-26 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716062#comment-14716062
 ] 

Srikanth Kandula commented on YARN-1011:


+1


 [Umbrella] RM should dynamically schedule containers based on utilization of 
 currently allocated containers
 ---

 Key: YARN-1011
 URL: https://issues.apache.org/jira/browse/YARN-1011
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy

 Currently RM allocates containers and assumes resources allocated are 
 utilized.
 RM can, and should, get to a point where it measures utilization of allocated 
 containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-16 Thread Srikanth Kandula (JIRA)
Srikanth Kandula created YARN-4056:
--

 Summary: Bundling: Searching for multiple containers in a single 
pass over {queues, applications, priorities}
 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula


More than one container is allocated on many NM heartbeats. Yet, the current 
scheduler allocates exactly one container per iteration over {queues, 
applications, priorities}. When there are many queues, applications, or 
priorities allocating only one container per iteration can  needlessly increase 
the duration of the NM heartbeat.
 
In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
to be allocated in a single iteration over {queues, applications and 
priorities}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-16 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698925#comment-14698925
 ] 

Srikanth Kandula commented on YARN-4056:


Will look. Possibly. However, this arch allows any bundling policy. We will 
push through a couple different bundled policies. I suspect the 
packer+dependencies+bounded unfairness bundled will be novel.

 Bundling: Searching for multiple containers in a single pass over {queues, 
 applications, priorities}
 

 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
Assignee: Robert Grandl
 Attachments: bundling.docx


 More than one container is allocated on many NM heartbeats. Yet, the current 
 scheduler allocates exactly one container per iteration over {{queues, 
 applications, priorities}}. When there are many queues, applications, or 
 priorities allocating only one container per iteration can  needlessly 
 increase the duration of the NM heartbeat.
  
 In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
 to be allocated in a single iteration over {{queues, applications and 
 priorities}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-16 Thread Srikanth Kandula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Kandula updated YARN-4056:
---
Description: 
More than one container is allocated on many NM heartbeats. Yet, the current 
scheduler allocates exactly one container per iteration over {{queues, 
applications, priorities}}. When there are many queues, applications, or 
priorities allocating only one container per iteration can  needlessly increase 
the duration of the NM heartbeat.
 
In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
to be allocated in a single iteration over {queues, applications and 
priorities}.

  was:
More than one container is allocated on many NM heartbeats. Yet, the current 
scheduler allocates exactly one container per iteration over {queues, 
applications, priorities}. When there are many queues, applications, or 
priorities allocating only one container per iteration can  needlessly increase 
the duration of the NM heartbeat.
 
In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
to be allocated in a single iteration over {queues, applications and 
priorities}.


 Bundling: Searching for multiple containers in a single pass over {queues, 
 applications, priorities}
 

 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
 Attachments: bundling.docx


 More than one container is allocated on many NM heartbeats. Yet, the current 
 scheduler allocates exactly one container per iteration over {{queues, 
 applications, priorities}}. When there are many queues, applications, or 
 priorities allocating only one container per iteration can  needlessly 
 increase the duration of the NM heartbeat.
  
 In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
 to be allocated in a single iteration over {queues, applications and 
 priorities}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-16 Thread Srikanth Kandula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Kandula updated YARN-4056:
---
Attachment: bundling.docx

 Bundling: Searching for multiple containers in a single pass over {queues, 
 applications, priorities}
 

 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
 Attachments: bundling.docx


 More than one container is allocated on many NM heartbeats. Yet, the current 
 scheduler allocates exactly one container per iteration over {queues, 
 applications, priorities}. When there are many queues, applications, or 
 priorities allocating only one container per iteration can  needlessly 
 increase the duration of the NM heartbeat.
  
 In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
 to be allocated in a single iteration over {queues, applications and 
 priorities}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-16 Thread Srikanth Kandula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Kandula updated YARN-4056:
---
Description: 
More than one container is allocated on many NM heartbeats. Yet, the current 
scheduler allocates exactly one container per iteration over {{queues, 
applications, priorities}}. When there are many queues, applications, or 
priorities allocating only one container per iteration can  needlessly increase 
the duration of the NM heartbeat.
 
In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
to be allocated in a single iteration over {{queues, applications and 
priorities}}.

  was:
More than one container is allocated on many NM heartbeats. Yet, the current 
scheduler allocates exactly one container per iteration over {{queues, 
applications, priorities}}. When there are many queues, applications, or 
priorities allocating only one container per iteration can  needlessly increase 
the duration of the NM heartbeat.
 
In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
to be allocated in a single iteration over {queues, applications and 
priorities}.


 Bundling: Searching for multiple containers in a single pass over {queues, 
 applications, priorities}
 

 Key: YARN-4056
 URL: https://issues.apache.org/jira/browse/YARN-4056
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
 Attachments: bundling.docx


 More than one container is allocated on many NM heartbeats. Yet, the current 
 scheduler allocates exactly one container per iteration over {{queues, 
 applications, priorities}}. When there are many queues, applications, or 
 priorities allocating only one container per iteration can  needlessly 
 increase the duration of the NM heartbeat.
  
 In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
 to be allocated in a single iteration over {{queues, applications and 
 priorities}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3820) Collect disks usages on the node

2015-06-29 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605862#comment-14605862
 ] 

Srikanth Kandula commented on YARN-3820:


Similar discussion ongoing at YARN-3819.

 Collect disks usages on the node
 

 Key: YARN-3820
 URL: https://issues.apache.org/jira/browse/YARN-3820
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, 
 YARN-3820-4.patch


 In this JIRA we propose to collect disks usages on a node. This JIRA is part 
 of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3819) Collect network usage on the node

2015-06-29 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605861#comment-14605861
 ] 

Srikanth Kandula commented on YARN-3819:


[~aw] Allen- Could you expand a bit? Specifically, where in common would you 
like us to incorporate this? This is a pretty small and somewhat 
straightforward change. We are plugging within what is already there in terms 
of the resource monitoring harness in the NM and want to hear what may be an 
equivalent place in common.


 Collect network usage on the node
 -

 Key: YARN-3819
 URL: https://issues.apache.org/jira/browse/YARN-3819
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3819-1.patch, YARN-3819-2.patch, YARN-3819-3.patch, 
 YARN-3819-4.patch, YARN-3819-5.patch


 In this JIRA we propose to collect the network usage on a node. This JIRA is 
 part of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3819) Collect network usage on the node

2015-06-29 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605917#comment-14605917
 ] 

Srikanth Kandula commented on YARN-3819:


From [~chris.douglas]
@Allen Wittenauer Is there a corresponding part of the datanode already 
monitoring these resources? I looked, but found only the metrics. This JIRA and 
YARN-3819 only extend the monitoring. As Karthik pointed out in YARN-2745, 
refactoring for more unified resource monitoring is in YARN-3332.

 Collect network usage on the node
 -

 Key: YARN-3819
 URL: https://issues.apache.org/jira/browse/YARN-3819
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3819-1.patch, YARN-3819-2.patch, YARN-3819-3.patch, 
 YARN-3819-4.patch, YARN-3819-5.patch


 In this JIRA we propose to collect the network usage on a node. This JIRA is 
 part of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3820) Collect disks usages on the node

2015-06-29 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605915#comment-14605915
 ] 

Srikanth Kandula commented on YARN-3820:


Copied [~chris.douglas] comment to 3820 as well.

[~chris.douglas] The intention with forcedRead is to allow the caller to decide 
whether or not a fresh read of the /proc is needed. If set to false, the code 
responds with the previously read readings. This is just to amortize cost of 
polling...

 Collect disks usages on the node
 

 Key: YARN-3820
 URL: https://issues.apache.org/jira/browse/YARN-3820
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, 
 YARN-3820-4.patch


 In this JIRA we propose to collect disks usages on a node. This JIRA is part 
 of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3820) Collect disks usages on the node

2015-06-29 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606275#comment-14606275
 ] 

Srikanth Kandula commented on YARN-3820:


Quick profile: the call to read /proc takes about 1ms per call. So, perhaps not 
a big deal. We are just trying to facilitate better usage mode for this API 
method...

 Collect disks usages on the node
 

 Key: YARN-3820
 URL: https://issues.apache.org/jira/browse/YARN-3820
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, 
 YARN-3820-4.patch


 In this JIRA we propose to collect disks usages on a node. This JIRA is part 
 of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3819) Collect network usage on the node

2015-06-17 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590587#comment-14590587
 ] 

Srikanth Kandula commented on YARN-3819:


[~grey] The patch does have the generic component, in that it needs 
/proc/net... It would be possible to expose whatever additional fields end up 
being needed by schedulers or monitors. We only expose a first cut of them 
(total read/ written).

 Collect network usage on the node
 -

 Key: YARN-3819
 URL: https://issues.apache.org/jira/browse/YARN-3819
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3819-1.patch, YARN-3819-2.patch, YARN-3819-3.patch


 In this JIRA we propose to collect the network usage on a node. This JIRA is 
 part of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-06-03 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571821#comment-14571821
 ] 

Srikanth Kandula commented on YARN-3366:


1) Does this also capture the network usage due to non containers? For eg. that 
due to evacuation or replication or data downloads? 

2) What about receive bandwidth?

3) Perhaps i missed this above, but what are the overhead microbenchmark 
numbers re: added latency for normal packets and extra cpu usage overall due to 
sending packets through tc/ due to polling tc counters periodically?

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2014-12-16 Thread Srikanth Kandula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Kandula updated YARN-2965:
---
Summary: Enhance Node Managers to monitor and report the resource usage on 
machines  (was: Enhance Node Managers to monitor and report the resource usage 
on the machines)

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2014-12-16 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248513#comment-14248513
 ] 

Srikanth Kandula commented on YARN-2965:


[~kasha]. Thanks. Yes re: config. Agree re: tunneling through NM heartbeat, we 
could offset the overhead at RM if need be... Re: per container usages, 
certainly, that would be a great extension. Our prototype did try to capture 
the usages of background activity. Just a heads-up though that tracking the 
network use and disk use per container requires some extra cleverness since 
that info is not readily available. Not sure will get to that on the first pass.

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2014-12-16 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248517#comment-14248517
 ] 

Srikanth Kandula commented on YARN-2965:


[~peng.zhang] Thanks. Yes, that would be quite useful.  Will build such that 
those extensions are possible.

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2966) Extend ask request to include additional fields

2014-12-16 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248524#comment-14248524
 ] 

Srikanth Kandula commented on YARN-2966:


Thanks [~kasha] [~varun_saxena], we do have an implementation. Will push that 
patch in this week.

 Extend ask request to include additional fields
 ---

 Key: YARN-2966
 URL: https://issues.apache.org/jira/browse/YARN-2966
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_expanded_ask.docx


 This JIRA is about extending the ask request from AM to RM to include 
 additional information that describe tasks' resource requirements other than 
 cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-16 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248528#comment-14248528
 ] 

Srikanth Kandula commented on YARN-2745:


Thanks [~jira.shegalov], do the proposed mods capture those use cases. Do add 
more detail if we should flesh this out in another way. We want to make some 
quick progress on this.

 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing

2014-11-17 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215720#comment-14215720
 ] 

Srikanth Kandula commented on YARN-2745:


Thanks Wangda.

Re: Yarn-314, that would be a good add! Our prototype implementation did not 
use this functionality (same priority + locality but different resource size). 
Because, we assumed that all of the tasks in a stage (e.g., map or reduce) have 
the same resource demand.  While this is a simplifying assumption and makes 
things easier to estimate, it is not always correct especially when there is 
skew. So the functionality in Yarn-314 would be a good thing to have and use 
here too.

 YARN new pluggable scheduler which does multi-resource packing
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf


 In this umbrella JIRA we propose a new pluggable scheduler, which accounts 
 for all resources used by a task (CPU, memory, disk, network) and it is able 
 to achieve three competing objectives: fairness, improve cluster utilization 
 and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing

2014-11-14 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212586#comment-14212586
 ] 

Srikanth Kandula commented on YARN-2745:


Thanks Karthik, that is an interesting thought. It seems that several of the 
proposed work-items (resource estimation, expanded asks, modifications to task 
matching on NM hearbeat) have to happen regardless of whether this is a new 
scheduler or a flag atop existing ones like FairScheduler. Do you foresee any 
additional complications to build this as a flag as opposed to stand-alone? 
Will take this offline.

 YARN new pluggable scheduler which does multi-resource packing
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf


 In this umbrella JIRA we propose a new pluggable scheduler, which accounts 
 for all resources used by a task (CPU, memory, disk, network) and it is able 
 to achieve three competing objectives: fairness, improve cluster utilization 
 and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1434) Single Job can affect fairshare of others

2013-11-22 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830129#comment-13830129
 ] 

Srikanth Kandula commented on YARN-1434:


Sandy Ryza,

I get it up to the receive the next container that the RM allocates.  But, 
why would this starve other AMs? Shouldn't the RM offer some other containers 
to these other jobs if the cluster is idle? 

I can see how some containers may be just tossing back and forth between the RM 
and the picky job. But do not see why other jobs receive less share than they 
would because of the picky job.

 Single Job can affect fairshare of others
 -

 Key: YARN-1434
 URL: https://issues.apache.org/jira/browse/YARN-1434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Priority: Minor

 A job receiving containers and deciding not to use them and yielding them 
 back in the next heartbeat could significantly affect the amount of resources 
 given to other jobs. 
 This is because by yielding containers back the job appears always to be 
 under-capacity (more than others) so it is picked to be the next to receive 
 containers.
 Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)