[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384176#comment-14384176
 ] 

Wangda Tan commented on YARN-3214:
--

And to your concerns about regression.

YARN-2694 temp restriction of only one node-label per node and per 
resource-request, because there're lots of problems when doing that. We need 
address these problems before let people to use. 2.6 doesn't have multiple node 
label support, so proposal attached in the JIRA isn't a regression.

In general, node label feature is still in alpha stage, after following tasks 
get completed, it can be a much better:
- YARN-2495, distribtued configuration for node label
- YARN-3214,
- YARN-3409, constraints support
- YARN-3362, UI support

Hope this makes sense to you, and looking forward for your thoughts/suggestions.

Thanks,

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384136#comment-14384136
 ] 

Wangda Tan commented on YARN-3214:
--

[~lohit],
I created and added some rough description for constraints at YARN-3409. I'm 
working on a design doc for constraint, should be ready to review soon. We can 
continue discuss it on YARN-3409.

Thanks,

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379092#comment-14379092
 ] 

Wangda Tan commented on YARN-3214:
--

Hi [~lohit],
Thanks for review the doc, for your comments:
bq. If so, it would become too restrictive. Labels on nodes can be seen in 
multiple dimension (from app's resource, machine resource and also usecase 
resouce, eg backfill jobs are placed on specific set of nodes). In those cases 
we should have ability to have multiple labels on node
Yes, now we only support one label for each node (partition). We temporarily 
support only one for each node is, if we have multiple labels on each node, it 
will hard to do resource planning (like what we did, we can say queue-A can use 
40% of label-X and queue-B can use 60% of label-X). Assume a node with label-X 
and label-Y, and its resource is 10G, it will be hard to say the node has 10G 
resource of (X+Y) OR 10G resource of X and Y. This also makes preemption hard 
to do. A tradeoff is, if we don't plan resource share (or capacity) on 
node-labels, some resource could be wasted and queues can be starved when they 
still under their configured capacity.
Multiple labels on node (we call this constraint) is in design stage, we've 
some thoughts about it, and will push it to community once it get a better 
share -- should not take too long.

bq. Also, in the documents there is mention of scheduling apps without any 
labels being scheduled on labeled nodes if resources are idle. Does that also 
cover apps which could have different label other than A/B, but still have a 
label be placed on these nodes when there is free resources available?
No, it will only try to allocate non-labeled requests to labeled nodes, if a 
resource request explicitly asks node label, we will only allocate 
corresponding labeled resource for it.




 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379141#comment-14379141
 ] 

Vinod Kumar Vavilapalli commented on YARN-3214:
---

May be we should start calling out partitions and attributes/constraints (when 
we have a JIRA) everywhere for clarity.

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379309#comment-14379309
 ] 

Wangda Tan commented on YARN-3214:
--

Hi [~lohit],
Problems of multiple labels on a same node and at most one label on each node 
are quite different:

At most one label on each node makes a cluster becomes several disjoint 
sub-clusters, all scheduling algorithms (no matter if using capacity/fair/fifo) 
can just simply run on the sub-cluster. 

If you want to divide resource for queues on labels (as example above, queue-A 
can use 40% of label-X and queue-B can use 60% of label-X) when we support 
multiple labels (say X and Y) on a same node (say node1), sub-clusters will 
become overlapping, that makes scheduling very hard:
When qA can access X and qB can access Y, how much resource of node1 you plan 
to allocate to qA/qB? A more complex example is, node1 has X,Y; node2 has X 
only, node3 has X,Z. This is a very tough problem and as far as I know (please 
let me know if I missed anything), there's no platform perfectly solved this.

So this is why separating partition vs. attribute/constraints becomes 
important. Partition is a way to divide cluster, each sub-cluster has similar 
properties (like how to share to queues) to a general cluster resource setting, 
that will be useful when a set of nodes contributed and shared to only a subset 
of queues of the entire cluster. Attribute is a just way to allocate container, 
a simple way to improve attribute/constraint is FCFS (first come first serve), 
no quota will be assigned to each attribute.

Mesos is different here, it doesn't do anything for node attributes in 
scheduling side, all node attributes will be directly passed to framework side 
directly, and framework will decide if accept or reject offer according to its 
node attributes, it will not take care of how to balance framework shares on 
each attributes.

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379263#comment-14379263
 ] 

Lohit Vijayarenu commented on YARN-3214:


Thanks [~wangda] for reply. I feel partitions and constraints as two separate 
entities will cause more confusion. If allocation is challenge (as you 
described in example for multiple labels), then it is something which should be 
solved in scheduler, no? This is same problem one would have even without 
labels. For a given node which advertises 10G of memory, and app/queue with X 
and Y, how would you divide resource among X and Y? 
PS: Mesos Scheduler for example uses term called constraints which is similar 
to labels. In that sense I agree with [~vinodkv] that we should probably call 
this feature as partition or something related? 

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-24 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378541#comment-14378541
 ] 

Lohit Vijayarenu commented on YARN-3214:


bq. (P0) A node can belong to at most one partition. All nodes belong to a 
DEFAULT
partition unless overridden.

Does this mean one a node we can have only one label? If so, it would become 
too restrictive. Labels on nodes can be seen in multiple dimension (from app's 
resource, machine resource and also usecase resouce, eg backfill jobs are 
placed on specific set of nodes). In those cases we should have ability to have 
multiple labels on node. 

Also, in the documents there is mention of scheduling apps without any labels 
being scheduled on labeled nodes if resources are idle. Does that also cover 
apps which could have different label other than A/B, but still have a label be 
placed on these nodes when there is free resources available?

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349173#comment-14349173
 ] 

Wangda Tan commented on YARN-3214:
--

Hi Naga,
Thanks for review,
First I need to day, what we plan to do for this JIRA and already committed 
patches of YARN-2492 is not only partitioning, it's partition and tagging nodes 
(but each node can have at most one partition). This is to make scheduler part 
consistent, and each queue can have a dedicated portion of capacities on 
different partitions.

Regarding {{And any given node can have at most one label of the first kind 
(one on which capacity can be specified ) and multiple tag kind of labels. App 
can specify label expression on tag kind of labels.}}.
We're working on a design for this -- support multiple labels in each node -- 
we have some discussions internally, there're some workable ways to do it, but 
they're not perfect (some limitations with these approach like you said). Will 
post design doc once the proposal get polished.

Wangda

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349194#comment-14349194
 ] 

Naganarasimha G R commented on YARN-3214:
-

Thanks for the clarification [~wangda],

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3214) Add non-exclusive node labels

2015-03-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348839#comment-14348839
 ] 

Naganarasimha G R commented on YARN-3214:
-

Hi [~wangda], 
I had a query (not sure whether this is the jira i need to discuss this though)
IIUC when Labels Requirement started we were trying to cater to 2 kinds of 
requirements
# Similar to the current jira, label the nodes and try to partition the cluster 
and ensure few queues/users get particular partition of nodes with high 
priority.(Multi Tenant scenario)
# Tagging the node with particular labels (like high MEM Nodes, More CPU cores, 
has more or particular kind of GPU's,  has particular library version , java 
version etc...) and trying to launch apps based on these tags.

Currently it seems like we are only focusing on first kind, and almost not 
supporting second one at all (we are not even accepting more than a label for a 
node), So i was thinking like we can support 2 kinds of labels; First kind of 
Labels which we will be able to support capacity  and second kind of label for 
tagging. And any given node can have at most one label of the first kind (one 
on which capacity can be specified ) and multiple tag kind of labels. App can 
specify label expression on tag kind of labels.
Correct me if my understanding is wrong or if there can be still limitations 
with the above said approach too.


 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)