[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2017-12-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Description: 
Existing YARN scheduler is based on node heartbeat. This can lead to 
sub-optimal decisions because scheduler can only look at one node at the time 
when scheduling resources.

Pseudo code of existing scheduling logic looks like:
{code}
for node in allNodes:
   Go to parentQueue
  Go to leafQueue
for application in leafQueue.applications:
   for resource-request in application.resource-requests
  try to schedule on node
{code}

Considering future complex resource placement requirements, such as node 
constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
regionsevers and Storm workers on the same host), we may need to consider 
moving YARN scheduler towards global scheduling.

  was:
Existing YARN scheduler is based on node heartbeat. This can lead to 
sub-optimal decisions because scheduler can only look at one node at the time 
when scheduling resources.

Pseudo code of existing scheduling logic looks like:
{code}
for node in allNodes:
   Go to parentQueue
  Go to leafQueue
for application in leafQueue.applications:
   for resource-request in application.resource-requests
  try to schedule on node
{code}

Considering future complex resource placement requirements, such as node 
constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
regionsevers and Storm workers on the same host), we may need to consider 
moving YARN scheduler towards global scheduling.

{color:red}Running design doc: 
https://docs.google.com/document/d/1rY-CJPLbGk3Xj_8sxre61y2YkHJFK8oqKOshro1ZY3A/edit#heading=h.xnzvh9nn283a{color}


> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2017-12-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Description: 
Existing YARN scheduler is based on node heartbeat. This can lead to 
sub-optimal decisions because scheduler can only look at one node at the time 
when scheduling resources.

Pseudo code of existing scheduling logic looks like:
{code}
for node in allNodes:
   Go to parentQueue
  Go to leafQueue
for application in leafQueue.applications:
   for resource-request in application.resource-requests
  try to schedule on node
{code}

Considering future complex resource placement requirements, such as node 
constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
regionsevers and Storm workers on the same host), we may need to consider 
moving YARN scheduler towards global scheduling.

{color:red}Running design doc: 
https://docs.google.com/document/d/1rY-CJPLbGk3Xj_8sxre61y2YkHJFK8oqKOshro1ZY3A/edit#heading=h.xnzvh9nn283a{color}

  was:
Existing YARN scheduler is based on node heartbeat. This can lead to 
sub-optimal decisions because scheduler can only look at one node at the time 
when scheduling resources.

Pseudo code of existing scheduling logic looks like:
{code}
for node in allNodes:
   Go to parentQueue
  Go to leafQueue
for application in leafQueue.applications:
   for resource-request in application.resource-requests
  try to schedule on node
{code}

Considering future complex resource placement requirements, such as node 
constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
regionsevers and Storm workers on the same host), we may need to consider 
moving YARN scheduler towards global scheduling.


> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.
> {color:red}Running design doc: 
> https://docs.google.com/document/d/1rY-CJPLbGk3Xj_8sxre61y2YkHJFK8oqKOshro1ZY3A/edit#heading=h.xnzvh9nn283a{color}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-10-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: YARN-5139.000.patch

Attached ver.000 patch and kick Jenkins

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-10-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: wip-5.YARN-5139.patch

Attached ver.5 wip patch which I used to run the test, it has few TODO items, I 
will update the patch in the next few days to make it to ready-to-review.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, 
> wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-10-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: YARN-5139-Concurrent-scheduling-performance-report.pdf

We're glad to share that our recently performance tests about concurrent 
scheduling based on the new global scheduling framework.

We use SLS (scheduler load simulator) simulated 2.4PB memory cluster, which has 
20K nodes, and 4k-12k applications running in parallel.

The new concurrent scheduling can get up to *6.25X* throughput (avg #containers 
allocated per second) comparing to original async scheduling of Capacity 
Scheduler.

More details please refer to attached 
{{YARN-5139-Concurrent-scheduling-performance-report.pdf}} and code is 
{{wip-5.YARN-5139.patch}}.

Thanks [~vinodkv]/[~gtCarrera9] for lots of valuable offline suggestions.

+ People who might be interesting for this: 
[~jlowe]/[~curino]/[~kasha]/[~asuresh]/[~kkaranasos]/[~subru].

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, 
> wip-4.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-09-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: Explanantions of Global Scheduling (YARN-5139) 
Implementation.pdf

I've uploaded a document to explain global scheduling implementation in the 
attached patch.

Hope it can make you easier understand the overall logic / changes without 
reviewing the half-MB patch.

This is the same doc on github: 
https://github.com/leftnoteasy/hadoop/blob/global-scheduling-3/global-scheduling-explaination.md.
 Hopefully it has better syntax highlight support than PDF.

Thanks [~vinodkv] for offline suggestions.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, 
> wip-4.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-08-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: wip-4.YARN-5139.patch

Also sharing updated prototype patch (wip-4).

It has changes: 
- Most logics described in the doc (v2), such as resource allocator / 
committer, placement set, scorer, etc.
- Read/write lock optimizations for some of the scheduler classes (Major reason 
of the scary patch size).

It should work properly for CapacityScheduler, I have updated the patch to make 
it work for most integration unit tests, such as tests in TestCapacityScheduler 
/ TestContainerAllocation / TestNodeLabelContainerAllocation, etc.

And please note that this patch doesn't apply to latest trunk, it is on top of 
commit: 869393643de23dcb010cc33091c8eb398de0fd6c (around Aug 17).

Please feel free to let me know your questions / comments. If everyone agrees 
with the general approach, I will go ahead to create a new branch and split the 
patch.

+ [~jianhe], [~kkaranasos], [~kasha], [~subru], [~curino].

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, 
> wip-4.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-08-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: 
YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf

Have offline discussion with [~vinodkv], [~kasha] and [~templedf]. All gave 
very valuable suggestions to the overall design.

I incorperated these suggestions to the design doc:

*Biggest update of the new design:*
It added the concept of concurrent resource allocator and resource committer:
- Resource allocator scan the read-only scheduler state, traverse the queue 
structure, and make resource allocation proposal (Like allocate <3G, 1 vcore> 
on host1 for app1 from queue=A). Since it won't change any scheduler state, we 
can have multiple resource allocator run concurrently in different threads.
- Resource allocation proposal will be send to resource commtter which can 
decide if the resource allocation proposal could be accepted or rejected, and 
it will update scheduler state if any resource allocation proposal is accepted. 
In the future we could implement optimistic locking logic to better merge 
conflicts.

More details please refer to attached ver.2 doc.

Any comments are welcome! 

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-08-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf

Attached design and implementation notes. Thanks [~vinodkv], [~hitesh] and 
[~bikassaha] for great offline suggestions.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-08-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: wip-3.YARN-5139.patch

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-06-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: wip-2.YARN-5139.patch

An update of WIP patch (wip-2):

- Added implementation of scorer/scorer-factory supports caching (See package: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.scorer)
- Added an example scorer (LocalityNodesScorer) to schedule containers to node 
with best locality
- Added a knob (yarn.scheduler.capacity.global-scheduling-enabled) to 
enable/disable global scheduling in CapacityScheduler
- Refactored implementation of RegularContainerAllocator to avoid duplicated 
check when doing global-scheduling. 
- Fixed compilation issues, now the wip-2 patch can apply and compiled on top 
of latest trunk.

For next patch, I will focus
- Code cleanups, remove hacks, do necessary refactoring to keep clean code 
base, etc.
- Basic performance test to make sure no significant regression to performance.
- Tests to demostrate global scheduling can make better scheduling decisions. 

Please share your thoughts about the patch when you get chance. [~kasha], 
[~asuresh], [~rohithsharma].

Thanks.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: wip-1.YARN-5139.patch, wip-2.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2016-05-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5139:
-
Attachment: wip-1.YARN-5139.patch

Uploaded a preliminary and WIP POC patch for Capacity Scheduler changes to help 
you better understanding my proposal. The patch includes a {{NodeCandidates}} 
class, which will be constructed by scheduler before invokin 
rootQueue.assignContainers(...).

Then ParentQueue/LeafQueue pass the {{NodeCandidates}} down to application. And 
application can sort nodes (using implementation of {{SchedulerNodesScorer}}) 
and allocate resources.

This idea is very close to approach mentioned in Google Borg paper. Except 
Google Borg select requests based on quota/priority and we select requests 
based on hierarchy of queues/applications.

This patch cannot be compiled, I will try to finish the POC patch with few more 
examples like global scheduling filter/scorer for locality, anti-affinity, etc. 
while waiting for your feedbacks.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: wip-1.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org