from:"Yufei Gu \(JIRA\)"

[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file

2020-01-28 Thread Yufei Gu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-10015:

Component/s: yarn

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file

2020-01-28 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025527#comment-17025527
 ] 

Yufei Gu commented on YARN-10015:
-

Committed to trunk. Thanks for the patch, [~aihuaxu]. Thanks for the review 
[~adam.antal].

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file

2020-01-28 Thread Yufei Gu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-10015:

Fix Version/s: 3.3.0

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file

2020-01-27 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024762#comment-17024762
 ] 

Yufei Gu commented on YARN-10015:
-

+1, will commit later.

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption

2019-11-12 Thread Yufei Gu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9537:
---
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, 
> YARN-9537.006.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-11-12 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972654#comment-16972654
 ] 

Yufei Gu commented on YARN-9537:


Committed to trunk. Thanks for the contribution, [~cane]. Thanks for the 
review, [~adam.antal] and [~snemeth]. 

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, 
> YARN-9537.006.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-11-11 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972075#comment-16972075
 ] 

Yufei Gu commented on YARN-9537:


[~cane], Thanks for the patch. +1 for the patch 006. Will commit later.

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, 
> YARN-9537.006.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-11-11 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971376#comment-16971376
 ] 

Yufei Gu commented on YARN-9537:


Agreed with [~snemeth]. The production code shouldn't do the null checking. 
Class FairScheduler should make sure that {{getConf}} won't be null before 
creating any {{FSAppAttempt}} object. Hi [~cane], can you refactor the test 
code since it fails a test  case per Hadoop QA?


> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9537) Add configuration to disable AM preemption

2019-11-07 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969450#comment-16969450
 ] 

Yufei Gu edited comment on YARN-9537 at 11/7/19 5:41 PM:
-

Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think 
aloud, why this property is cluster level instead of queue level?
There are minor issues. 
# {{protected static final String  AM_PREEMPTION = CONF_PREFIX + 
"am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION"
#  Do we need this comment? Probably not.
{code:java}
// For test
  this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled();
{code}
# {{  public void testDisableAMPreemption() throws Exception }} No need to 
throw.


was (Author: yufeigu):
Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think 
aloud, why this property is cluster level instead of queue level?
There are style issues. 
# {{protected static final String  AM_PREEMPTION = CONF_PREFIX + 
"am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION"
#  Do we need this comment? Probably not.
{code:java}
// For test
  this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled();
{code}


> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-11-07 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969450#comment-16969450
 ] 

Yufei Gu commented on YARN-9537:


Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think 
aloud, why this property is cluster level instead of queue level?
There are style issues. 
# {{protected static final String  AM_PREEMPTION = CONF_PREFIX + 
"am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION"
#  Do we need this comment? Probably not.
{code:java}
// For test
  this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled();
{code}


> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-05 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967729#comment-16967729
 ] 

Yufei Gu commented on YARN-9940:


Hi [~kailiu_dev], added you to the contributor role, and assign this to you. I 
will try to review this later.

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Assignee: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940-branch-2.7.2.001.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'

2019-11-05 Thread Yufei Gu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-9940:
--

Assignee: kailiu_dev

> avoid continuous scheduling thread crashes while sorting nodes get 
> 'Comparison method violates its general contract'
> 
>
> Key: YARN-9940
> URL: https://issues.apache.org/jira/browse/YARN-9940
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: kailiu_dev
>Assignee: kailiu_dev
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: YARN-9940-branch-2.7.2.001.patch
>
>
> 2019-10-16 09:14:51,215 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>     at java.util.TimSort.mergeHi(TimSort.java:868)
>     at java.util.TimSort.mergeAt(TimSort.java:485)
>     at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
>     at java.util.TimSort.sort(TimSort.java:223)
>     at java.util.TimSort.sort(TimSort.java:173)
>     at java.util.Arrays.sort(Arrays.java:659)
>     at java.util.Collections.sort(Collections.java:217)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext

2019-08-27 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917399#comment-16917399
 ] 

Yufei Gu commented on YARN-5913:


[~ykabusalah] feel free to take any Jira without assignee.

> Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
> --
>
> Key: YARN-5913
> URL: https://issues.apache.org/jira/browse/YARN-5913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> Usage of these two variables overlaps and causes confusion. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6425) Move out FS state dump code out of method update()

2019-08-27 Thread Yufei Gu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917398#comment-16917398
 ] 

Yufei Gu commented on YARN-6425:


[~ykabusalah] feel free to do that.

> Move out FS state dump code out of method update()
> --
>
> Key: YARN-6425
> URL: https://issues.apache.org/jira/browse/YARN-6425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Priority: Major
>  Labels: newbie++
>
> Better to move out FS state dump code out of update()
> {code}
> if (LOG.isDebugEnabled()) {
>   if (--updatesToSkipForDebug < 0) {
> updatesToSkipForDebug = UPDATE_DEBUG_FREQUENCY;
> dumpSchedulerState();
>   }
> }
> {code}
> And, after that we should distinct between update call and update thread 
> duration like before YARN-6112. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2497) Fair scheduler should support strict node labels

2019-07-22 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890573#comment-16890573
 ] 

Yufei Gu commented on YARN-2497:


Hi [~chenzhaohang], AFAIK, FS doesn't support node label in any version. 

> Fair scheduler should support strict node labels
> 
>
> Key: YARN-2497
> URL: https://issues.apache.org/jira/browse/YARN-2497
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Wangda Tan
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-2497.001.patch, YARN-2497.002.patch, 
> YARN-2497.003.patch, YARN-2497.004.patch, YARN-2497.005.patch, 
> YARN-2497.006.patch, YARN-2497.007.patch, YARN-2497.008.patch, 
> YARN-2497.009.patch, YARN-2497.010.patch, YARN-2497.011.patch, 
> YARN-2497.branch-3.0.001.patch, YARN-2499.WIP01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9537) Add configuration to disable AM preemption

2019-07-22 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-9537:
--

Assignee: zhoukang

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-07-22 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890572#comment-16890572
 ] 

Yufei Gu commented on YARN-9537:


Hi [~cane], added you to contributor, and assign this to you. Will you still 
work on this?

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption

2019-07-18 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9537:
---
Fix Version/s: (was: 3.1.2)
   (was: 3.2.0)

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Priority: Major
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-06-17 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865860#comment-16865860
 ] 

Yufei Gu commented on YARN-9537:


Hi [~cane], thanks for the patch. Could you elaborate your use case?

> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9537.001.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9537) Add configuration to support AM preemption

2019-05-09 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836625#comment-16836625
 ] 

Yufei Gu commented on YARN-9537:


FairScheduler doesn't prevent you from preempting the AM container. It just 
tries to preempt as less AM containers as possible. 

> Add configuration to support AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: zhoukang
>Priority: Major
>
> In our production cluster, we can tolerate am preemption. So we can add a 
> configuration to support am preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-30 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830769#comment-16830769
 ] 

Yufei Gu commented on YARN-9520:


* inter-queue preemption will not happen among the applications of inside the 
queue.
Yes.
* FIFO ordering policy the newer applications will preempted first if the 
priority is same or not set. In other words, the older applications will 
considered for preemption only after the newer applications are preempted.
No.  Only the oldest one has less chance to be preempted. All others have the 
same chance.
* multiple applications of a queue will run if resources are available. lets 
say there are resources for 200 containers, 2 applications of 100 containers 
will run. after 50 containers of each finished does the 3rd containers will get 
allocated? or it will wait for first 2 applications will finish?
Yes.  The 3rd one can run. 

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-30 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829993#comment-16829993
 ] 

Yufei Gu commented on YARN-9520:


Seems like you don't need queue A be fair policy. Why not set it to fifo 
instead?

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-29 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829685#comment-16829685
 ] 

Yufei Gu commented on YARN-9520:


Could you elaborate the user case?

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication

2019-04-01 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807378#comment-16807378
 ] 

Yufei Gu commented on YARN-9214:


Committed to trunk. Thanks [~jiwq] for the contribution. Thanks [~snemeth] for 
the review.

> Add AbstractYarnScheduler#getValidQueues method to remove duplication
> -
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch, 
> YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication

2019-04-01 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9214:
---
Summary: Add AbstractYarnScheduler#getValidQueues method to remove 
duplication  (was: Add AbstractYarnScheduler#getValidQueues method to resolve 
duplicate code )

> Add AbstractYarnScheduler#getValidQueues method to remove duplication
> -
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch, 
> YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-04-01 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807374#comment-16807374
 ] 

Yufei Gu commented on YARN-9214:


+1

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch, 
> YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`

2019-04-01 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807053#comment-16807053
 ] 

Yufei Gu edited comment on YARN-9401 at 4/1/19 6:20 PM:


Do we plan to release YARN separately? Probably never. With that, I suggest to 
explore the idea of removing class YarnVersionInfo rather than this change. It 
is OK to remove it by looking at the reference in the web-app, besides, the 
class is "Private and Unstable".  New more thoughts from people, cc [~vinodkv].


was (Author: yufeigu):
Do we plan to release YARN separately? Probably never. With that, I suggest to 
explore the idea of removing class YarnVersionInfo rather than this change. It 
is OK to remove it by looking at the reference in the web-app, besides, the 
class is "Private and Unstable".  New more thoughts from people, cc [~vikumar].

> Fix `yarn version` print the version info is the same as `hadoop version`
> -
>
> Key: YARN-9401
> URL: https://issues.apache.org/jira/browse/YARN-9401
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Minor
> Attachments: YARN-9401.001.patch, YARN-9401.002.patch
>
>
> It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` 
> instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the 
> `HADOOP_CLASSNAME` by mistake.
> {panel:title=Before}
> Hadoop 3.3.0-SNAPSHOT
> Source code repository [https://github.com/apache/hadoop.git] -r 
> 53a86e2b8ecb83b666d4ed223fc270e1a46642c1
> Compiled by jiwq on 2019-04-01T04:55Z
> Compiled with protoc 2.5.0
> From source with checksum 829bd6e22c17c6da74f5c1a61647922
> {panel}
> {panel:title=After}
> YARN 3.3.0-SNAPSHOT
> Subversion [https://github.com/apache/hadoop.git] -r 
> 53a86e2b8ecb83b666d4ed223fc270e1a46642c1
> Compiled by jiwq on 2019-04-01T05:06Z
> From source with checksum e10a192bd933ffdafe435d7fe99d24d
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`

2019-04-01 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807053#comment-16807053
 ] 

Yufei Gu commented on YARN-9401:


Do we plan to release YARN separately? Probably never. With that, I suggest to 
explore the idea of removing class YarnVersionInfo rather than this change. It 
is OK to remove it by looking at the reference in the web-app, besides, the 
class is "Private and Unstable".  New more thoughts from people, cc [~vikumar].

> Fix `yarn version` print the version info is the same as `hadoop version`
> -
>
> Key: YARN-9401
> URL: https://issues.apache.org/jira/browse/YARN-9401
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Minor
> Attachments: YARN-9401.001.patch, YARN-9401.002.patch
>
>
> It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` 
> instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the 
> `HADOOP_CLASSNAME` by mistake.
> {panel:title=Before}
> Hadoop 3.3.0-SNAPSHOT
> Source code repository [https://github.com/apache/hadoop.git] -r 
> 53a86e2b8ecb83b666d4ed223fc270e1a46642c1
> Compiled by jiwq on 2019-04-01T04:55Z
> Compiled with protoc 2.5.0
> From source with checksum 829bd6e22c17c6da74f5c1a61647922
> {panel}
> {panel:title=After}
> YARN 3.3.0-SNAPSHOT
> Subversion [https://github.com/apache/hadoop.git] -r 
> 53a86e2b8ecb83b666d4ed223fc270e1a46642c1
> Compiled by jiwq on 2019-04-01T05:06Z
> From source with checksum e10a192bd933ffdafe435d7fe99d24d
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`

2019-03-31 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806357#comment-16806357
 ] 

Yufei Gu commented on YARN-9401:


Thank [~jiwq] for working on this. 
Looking more deeply, the hdfs command just uses VersionInfo. This is never a 
big issue likely due to YARN and HDFS never be released separately. Besides, I 
didn't see why we need the class YarnVersionInfo. Hi [~wangda], do you happen 
to know why we need class YarnVersionInfo?

> Fix `yarn version` print the version info is the same as `hadoop version`
> -
>
> Key: YARN-9401
> URL: https://issues.apache.org/jira/browse/YARN-9401
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Minor
> Attachments: YARN-9401.001.patch, YARN-9401.002.patch
>
>
> It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` 
> instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the 
> `HADOOP_CLASSNAME` by mistake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-03-31 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806325#comment-16806325
 ] 

Yufei Gu commented on YARN-9214:


Thank [~jiwq] for working on this. 
{code}
  LOG.warn(errMsg);
  throw new YarnException(errMsg);
{code}
It doesn't make sense to LOG.warn since we've thrown a exception here. Suggest 
to remove it though it isn't introduced by your patch.

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch, YARN-9214.002.patch, 
> YARN-9214.003.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-25 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801412#comment-16801412
 ] 

Yufei Gu commented on YARN-8967:


Committed to trunk. Thanks [~wilfreds] for the contribution.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-23 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799736#comment-16799736
 ] 

Yufei Gu commented on YARN-8967:


+1. Will commit later.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-22 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798814#comment-16798814
 ] 

Yufei Gu commented on YARN-8967:


Hi [~wilfreds], thanks for the patch.
3) Yeah, the xml DOM looks like a little bit silly. getChildNodes() at least 
should provide an option to return only elements rather than childs mixed with 
elements and texts. I believe some new libs should solve this issue. We could 
do something like this to hide second loop in a method getParentNode(). 
{code}
Element parentNode = getParentNode(node.getChildNodes());
PlacementRule parentRule = getParentRule(parentNode, fs);
 {code}
4) That's nice.
5) I do think the current solution is better. Let's ignore this checkstyle 
warning. Just one concern, can we make both member in class RuleMap “final”? So 
that no code can change their value except the constructor.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-20 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796846#comment-16796846
 ] 

Yufei Gu commented on YARN-8967:


Hi [~wilfreds], the patch v9 looks really good. 
{quote}
Based on all this I do think I need to file a follow up jira to fix the Hive 
SHIM that uses the policy at the moment and move that to the new code in a 
backward compatible way.
{quote}
I am with you.

Some nits:
1. Sorry to miss this in last review, there is no need to add a debug log since 
we throw an exception here.
{code}
LOG.debug("Initialising rule set failed", ioe);
throw new AllocationConfigurationException(
"Rule initialisation failed with exception", ioe);
{code}
3. Too many nested if/for statements in the method fromXml(). It would be nice 
to exact some logic in the loop to a separated method or we can use the {{if (! 
node instanceof Element) continue;}} to avoid one layer.
4. I made up a new test case, the “nestedUserQueue” has 2 parents, only the 
second one takes effect. I believe we should at least LOG a warn for the first 
parent “primaryGroup” and we don’t need to create and initialize it since it 
will be overwritten by the second parent. 
{code}
StringBuffer sb = new StringBuffer();
sb.append("");
sb.append("  ");
sb.append("  ");
sb.append("   ");
sb.append("   ");
sb.append("  ");
sb.append("");

createPolicy(sb.toString());
{code}
5. Not a fan of the getters in the nested class RuleMap. It could be as simple 
as possible as a wrapper class for multiple values, just like the case class in 
Scala or data class in Kotlin. This is just my preference. I’m OK with current 
implement though.



> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-13 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791411#comment-16791411
 ] 

Yufei Gu commented on YARN-8967:


Hi [~wilfreds], thanks for the patch. Some comments:
1. Nice cleanup in class QueueManager.
2. Should we deprecate two constructors of class AllocationFileLoaderService 
rather than remove them since it is a public class? 
3. "{{public List getRules()}}" here, Modifier “public” could be 
“no modifier” since all test cases invoking method getRules() are in the same 
package.
4. I would suggest to put the exception into the LOG.warn and remove LOG.debug 
in method "placeApplication()".
5. The method "addApplication()" is a little bit messy due to it holds both 
logic for adding a new application and recovered application. I feel like it 
would be cleaner if we separate the method addApplication to two methods, one 
for add new application anther for recover applications. Just some thoughts. 
What do you think?
6. Since we’ve got this, we doesn’t need to check whether queueName is null in 
method addApplication(). 
{code:java}
if (queueName != null) {
  addApplication(appAddedEvent.getApplicationId(),
  queueName, appAddedEvent.getUser(),
  appAddedEvent.getIsAppRecovering(),
  appAddedEvent.getPlacementContext());
}
{code}
7. Do we still need this check “if (queueName.startsWith(".") || 
queueName.endsWith(“.”))”? We’ve normalized queue names in placement rule for a 
new application and the queue name should be valid for a recovered app.

Class {{QueuePlacementPolicy}} related comments:
1. The QueuePlacementPolicy objects in class AllocationConfiguration are never 
used by production code if we {{updateRules()}} in the constructor. I would 
suggest either moving  {{updateRules()}} out of the QueuePlacementPolicy 
constructor or removing all QueuePlacementPolicy objects and making 
QueuePlacementPolicy a utility class. I prefer the first one since it reduces 
coupling. In that case, the AllocationConfiguration object still keeps all 
configurations items including placement rules, which is a consistent behavior.
2. You probably need an another comment style to make this link work {{{@link 
#getTerminal}}}
3. Incomplete comment {{// The list must be}} in class QueuePlacementPolicy
4. Typos in comment “Builds an QueuePlacementPolicy from an xml element.”  an 
-> a
5. “testNoCreate()” contains some duplicated test cases. I’m OK if you delete 
it or not since it isn’t introduced by your patch.
6. I would suggest to refactor the method “fromXml()” a little bit by 
introducing a new method like “getParentRule()”
7. We could create a nested class like the following in class 
QueuePlacementPolicy to avoid multiple “get(0)” and “get(1)” in the code. 
{code:java}
  public static class Policy {
public Object clazzz;
public String terminal;
}
{code}
8. Find the following code in class SpecifiedPlacementRule, no need to both log 
error and throw an exception. Would you mind fix it in this patch although it 
isn’t introduced by this patch?
{code:java}
LOG.error("Specified queue name not valid: '{}'", queueName);
throw new YarnException("Application submitted by user " + user +
"with illegal queue name '" + queueName + "'.");
{code}

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-04 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9298:
---
Fix Version/s: 3.3.0

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-04 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784193#comment-16784193
 ] 

Yufei Gu commented on YARN-9298:


Committed to trunk. Thanks [~wilfreds] for the contribution.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-04 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783618#comment-16783618
 ] 

Yufei Gu commented on YARN-9298:


+1 for the patch v6. Will commit later.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-03 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782911#comment-16782911
 ] 

Yufei Gu commented on YARN-9298:


[~wilfreds], thanks for the patch. Looks really good! Just some nits:
1. There are unused imports in class FairQueuePlacementUtils and class 
PlacementRule
2. {{private PlacementRule parentRule = null;}}, no need set it to null as a 
class member since the default is null.
3. {{protected boolean createQueue = false;}}, I suggest to remove the 
initialization or set it to true since it will be set to true by default anyway.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-28 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780302#comment-16780302
 ] 

Yufei Gu commented on YARN-9298:


Hi [~wilfreds], thanks for the patch. I am glad we are moving forward. 
1. The current implementations of multiple FS rules are still kind of verbose. 
I think a new abstract class like this “{{public abstract class 
FairPlacemenRule extends PlacementRule}}” can solve the duplication. It can 
contain not only 3 {{setConfig()}} in it, but also all fair scheduler rules 
related methods and variables. The method {{initialize}} can have a default 
implementation as well. So that we can leave class PlacementRule as it is. And 
in method getPlacementRule, some minor changes are needed.
2. There is an unused import in class TestFairQueuePlacementUtils.
3. If a switch statement isn’t suit here. I personally prefer {{if {} else if 
{} else {}}} rather than {{if {} else { if {} else { in this case, which 
looks cleaner by reducing nested level. However, I won’t insist on this.



> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-24 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175
 ] 

Yufei Gu edited comment on YARN-9298 at 2/24/19 9:31 AM:
-

Hi [~wilfreds], thanks for the patch. It is really nice to add these unit 
tests. Some comments:
1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, 
Configuration conf)}}, but it is only used by CS. You may need to update the 
comments.
2. I would suggest the unit test messages to clarify the expectation wasn't met 
or some actions failed, like this “Rule object shouldn’t be null” or "Failed to 
instantiate the rule object.". “Cleaned name was changed for clean input" could 
be something like “Unexpected cleaned name.” Or “Failed to clean name” 
3. Can you add a case “root” in method {{testAssureRoot()}}?
4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test 
against DefaultPlacementRule, and all other real rules? Besides, Unit tests are 
needed for the all FS placement rule classes. I’m OK if you want to move some 
code from YARN-8967 and reuse existing tests, like the one in class 
TestQueuePlacementPolicy
5. if {} else if {} else {}  or a switch statement could be cleaner than if {} 
else { if {} else {}}  in method {{setConfig}}
6. There are some common code in method {{*Rule::initialize()}} and 
{{*Rule::setConfig()}}, we can probably put them into either class  
{{PlacementRule}} or class {{FairQueuePlacementUtils}}.



was (Author: yufeigu):
Hi [~wilfreds], thanks for the patch. It is really nice to add these unit 
tests. Some comments:
1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, 
Configuration conf)}}, but it is only used by CS. You may need to update the 
comments.
2. I would suggest the unit test messages to clarify the expectation or some 
actions failed, like this “Rule object shouldn’t be null” or "Failed to 
instantiate the rule object.". “Cleaned name was changed for clean input" could 
be something like “Unexpected cleaned name.” Or “Failed to clean name” 
3. Can you add a case “root” in method {{testAssureRoot()}}?
4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test 
against DefaultPlacementRule, and all other real rules? Besides, Unit tests are 
needed for the all FS placement rule classes. I’m OK if you want to move some 
code from YARN-8967 and reuse existing tests, like the one in class 
TestQueuePlacementPolicy
5. if {} else if {} else {}  or a switch statement could be cleaner than if {} 
else { if {} else {}}  in method {{setConfig}}
6. There are some common code in method {{*Rule::initialize()}} and 
{{*Rule::setConfig()}}, we can probably put them into either class  
{{PlacementRule}} or class {{FairQueuePlacementUtils}}.


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776184#comment-16776184
 ] 

Yufei Gu commented on YARN-9278:


bq. If our cluster has a lot of long-running jobs, the above method is not 
helpful. 
That's unfortunate.

Setting a maximum num of nodes to iterate seems a quick-and-dirty way to solve 
the latency in big clusters. Let's brainstorm the solution. [~Steven Rand] and 
[~wilfreds], what do you think?

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-24 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175
 ] 

Yufei Gu edited comment on YARN-9298 at 2/24/19 9:19 AM:
-

Hi [~wilfreds], thanks for the patch. It is really nice to add these unit 
tests. Some comments:
1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, 
Configuration conf)}}, but it is only used by CS. You may need to update the 
comments.
2. I would suggest the unit test messages to clarify the expectation or some 
actions failed, like this “Rule object shouldn’t be null” or "Failed to 
instantiate the rule object.". “Cleaned name was changed for clean input" could 
be something like “Unexpected cleaned name.” Or “Failed to clean name” 
3. Can you add a case “root” in method {{testAssureRoot()}}?
4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test 
against DefaultPlacementRule, and all other real rules? Besides, Unit tests are 
needed for the all FS placement rule classes. I’m OK if you want to move some 
code from YARN-8967 and reuse existing tests, like the one in class 
TestQueuePlacementPolicy
5. if {} else if {} else {}  or a switch statement could be cleaner than if {} 
else { if {} else {}}  in method {{setConfig}}
6. There are some common code in method {{*Rule::initialize()}} and 
{{*Rule::setConfig()}}, we can probably put them into either class  
{{PlacementRule}} or class {{FairQueuePlacementUtils}}.



was (Author: yufeigu):
Hi [~wilfreds], thanks for the patch. It is really nice to add these unit 
tests. Some comments:
1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, 
Configuration conf)}}, but it is only used by CS. You may need to update the 
comments.
2. I would suggest the unit test messages to clarify the expectation or some 
actions failed, like this “Rule object shouldn’t be null”. “Cleaned name was 
changed for clean input" could be something like “Unexpected cleaned name.” Or 
“Failed to clean name” 
3. Can you add a case “root” in method {{testAssureRoot()}}?
4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test 
against DefaultPlacementRule, and all other real rules? Besides, Unit tests are 
needed for the all FS placement rule classes. I’m OK if you want to move some 
code from YARN-8967 and reuse existing tests, like the one in class 
TestQueuePlacementPolicy
5. if {} else if {} else {}  or a switch statement could be cleaner than if {} 
else { if {} else {}}  in method {{setConfig}}
6. There are some common code in method {{*Rule::initialize()}} and 
{{*Rule::setConfig()}}, we can probably put them into either class  
{{PlacementRule}} or class {{FairQueuePlacementUtils}}.


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-24 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175
 ] 

Yufei Gu commented on YARN-9298:


Hi [~wilfreds], thanks for the patch. It is really nice to add these unit 
tests. Some comments:
1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, 
Configuration conf)}}, but it is only used by CS. You may need to update the 
comments.
2. I would suggest the unit test messages to clarify the expectation or some 
actions failed, like this “Rule object shouldn’t be null”. “Cleaned name was 
changed for clean input" could be something like “Unexpected cleaned name.” Or 
“Failed to clean name” 
3. Can you add a case “root” in method {{testAssureRoot()}}?
4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test 
against DefaultPlacementRule, and all other real rules? Besides, Unit tests are 
needed for the all FS placement rule classes. I’m OK if you want to move some 
code from YARN-8967 and reuse existing tests, like the one in class 
TestQueuePlacementPolicy
5. if {} else if {} else {}  or a switch statement could be cleaner than if {} 
else { if {} else {}}  in method {{setConfig}}
6. There are some common code in method {{*Rule::initialize()}} and 
{{*Rule::setConfig()}}, we can probably put them into either class  
{{PlacementRule}} or class {{FairQueuePlacementUtils}}.


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-20 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773299#comment-16773299
 ] 

Yufei Gu edited comment on YARN-9278 at 2/20/19 7:47 PM:
-

Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 
The idea of setting a node # threshhold doesn't look elegant, but reasonable if 
we can't change the iteration-and-check way to identify preemptable containers. 
It may not be the only idea though.

Without introduce more complexity to FS preemption, it is already very 
complicated, there are some workarounds you can try: To increase FairShare 
Preemption Timeout and FairShare Preemption Threshold to reduce the chance of 
preemption. This is specially useful for a large cluster, since there is more 
chance to get resources just by waiting. 



was (Author: yufeigu):
Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 

I would suggest lower 
{{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let 
preemption kick in earlier for a large cluster. The default value is 80%, which 
means preemption won't kick in until 80% resources of the whole cluster have 
been used. Please be aware that low utilization threshold may cause an 
unnecessary container churn, so you don't want it to be too low. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-20 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773299#comment-16773299
 ] 

Yufei Gu commented on YARN-9278:


Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 

I would suggest lower 
{{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let 
preemption kick in earlier for a large cluster. The default value is 80%, which 
means preemption won't kick in until 80% resources of the whole cluster have 
been used. Please be aware that low utilization threshold may cause an 
unnecessary container churn, so you don't want it to be too low. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-16 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770247#comment-16770247
 ] 

Yufei Gu commented on YARN-9298:


Hi [~wilfreds], thanks for splitting and provide the patch. Some comments:
1. Can you add a “@Private” and  “@Unstable” notations to all new classes? Can 
you do the same to class PlacementFactory and PlacementRule since you are 
changing them?
2. I guess you didn’t bring in unit tests due to the splitting. I just feel 
uncomfortable to push so many changes without adding any unit test. Can you add 
unit tests in this jira? it is quit practical to add unit tests for methods in 
class {{FairQueuePlacementUtils}}, may be a little bit trickier for other 
classes.
3. There is one extra empty line at the end of class “PlacementFactory”
4. Can you use the org.apache.hadoop.util.ReflectionUtils to get a new instance 
rather than the code in getPlacementRule()?
5. {{public static  T getPlacementRule(Class theClass,…)}} could be 
{{public static PlacementRule getPlacementRule(Class 
theClass }} to enforce the type.
6. It is obvious to developers that getting placement is “getting queues”, but 
still looks confusing to code reader. Can we clarify that here?  {{* Get queue 
for a given application.}}
7. LOG name is wrong in class {{FairQueuePlacementUtils}}
8. In methods {{initialize()}}, there is no need to log error since you’ve 
raised exceptions.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766287#comment-16766287
 ] 

Yufei Gu commented on YARN-9277:


Hi [~uranus], some general comments, I haven't looked at the code yet.
bq. We should not preempt self
+1
bq. We should not preempt high priority job. 
Correct me if I am wrong, there are no priority between Yarn jobs. Priority has 
been applied to tasks inside one job, which was there before the FS preemption 
overhaul. We need only priorities between mappers and reducers or other 
customized priorities since AM containers are always the first priority and 
have been taken care.
bq. We should not preempt container which has been running for a long time.
Makes sense if all other conditions are exactly the same.

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-03 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759562#comment-16759562
 ] 

Yufei Gu commented on YARN-8967:


Hi [~wilfreds], the patch v4 doesn't apply to the trunk. Can you rebase it?

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-03 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-8967:
---
Fix Version/s: (was: 3.3)

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-03 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-8967:
---
Fix Version/s: 3.3

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3
>
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt

2018-12-04 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709067#comment-16709067
 ] 

Yufei Gu commented on YARN-9041:


Committed to trunk. Thanks [~jiwq] for working on this. Thanks [~Steven Rand] 
for the review.

> Performance Optimization of method 
> FSPreemptionThread#identifyContainersToPreempt
> -
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Affects Versions: 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt

2018-12-04 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9041:
---
Fix Version/s: 3.2.1

> Performance Optimization of method 
> FSPreemptionThread#identifyContainersToPreempt
> -
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Affects Versions: 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt

2018-12-04 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9041:
---
Summary: Performance Optimization of method 
FSPreemptionThread#identifyContainersToPreempt  (was: Performance Optimization 
of FSPreemptionThread#identifyContainersToPreempt method)

> Performance Optimization of method 
> FSPreemptionThread#identifyContainersToPreempt
> -
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Affects Versions: 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9041) Performance Optimization of FSPreemptionThread#identifyContainersToPreempt method

2018-12-04 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9041:
---
Summary: Performance Optimization of 
FSPreemptionThread#identifyContainersToPreempt method  (was: Optimize 
FSPreemptionThread#identifyContainersToPreempt method)

> Performance Optimization of FSPreemptionThread#identifyContainersToPreempt 
> method
> -
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Affects Versions: 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-12-03 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9041:
---
Affects Version/s: 3.1.1

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Affects Versions: 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-12-03 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708242#comment-16708242
 ] 

Yufei Gu commented on YARN-9041:


The last patch looks good. +1 for the patch v7. Will commit this soon.

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-12-03 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9041:
---
Component/s: fairscheduler

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch, YARN-9041.007.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-12-03 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707857#comment-16707857
 ] 

Yufei Gu commented on YARN-9041:


Thanks for the patch. Some nits:
# {{ * @return list preemptable containers}} should be something like {{the 
list of best preemptable containers for the resource request}}
# We still need some comments in both tests to clarify which logic path the 
test are for. For example, we can add comments in 
{{testRelaxLocalityToPreemptLessAM}} to say that it tests the case that there 
is no less-AM-container solution in the remaining nodes.

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, 
> YARN-9041.006.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-29 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704014#comment-16704014
 ] 

Yufei Gu commented on YARN-9041:


Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits:
# Can you rename two tests or write comments to clarify their intentions? I 
suppose the goals of two methods are: One can find less AM containers solution 
in the relax locations and the other can't.
# It is a good practice to put the callee methods under the caller methods.
# Can you refactor to create a new method like this? Please remember to 
reorganize the method java doc.  And we probably don't need 
the comment "// Don't preempt AM containers just to satisfy local requests if 
relax // locality is enabled." in that case.
{code}
  /**
   * Iterate through matching
   *  nodes and identify containers to preempt all on one node, also
   ** optimizing for least number of AM container preemptions. Only nodes
   ** that match the locality level specified in the {@link ResourceRequest}
   ** are considered. However, if this would lead to AM preemption, and 
locality
   ** relaxation is allowed, then the search space is expanded to the 
remaining
   ** nodes.
   *
   * @param rr
   * @param potentialNodes
   * @return
   */
  private PreemptableContainers getBestPreemptableContainers(ResourceRequest 
rr, List potentialNodes) {
PreemptableContainers bestContainers =
identifyContainersToPreemptForOneContainer(potentialNodes, rr);


if (rr.getRelaxLocality()
&& !ResourceRequest.isAnyLocation(rr.getResourceName())
&& bestContainers != null
&& bestContainers.numAMContainers > 0) {
  List remainingNodes = 
scheduler.getNodeTracker().getAllNodes();
  remainingNodes.removeAll(potentialNodes);
  PreemptableContainers spareContainers = 
identifyContainersToPreemptForOneContainer(remainingNodes, rr);
  if (spareContainers != null && spareContainers.numAMContainers < 
bestContainers.numAMContainers) {
bestContainers = spareContainers;
  }
}
return bestContainers;
  }
{code}

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-29 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704014#comment-16704014
 ] 

Yufei Gu edited comment on YARN-9041 at 11/29/18 11:21 PM:
---

Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits:
# Can you rename two tests or write comments to clarify their intentions? I 
suppose the goals of two methods are: One can find less AM containers solution 
in the relax locations and the other can't.
# It is a good practice to put the callee methods under the caller methods.
# Can you refactor to create a new method like this? Please remember to 
reorganize the method java doc.  And we probably don't need the comment "// 
Don't preempt AM containers just to satisfy local requests if relax // locality 
is enabled." in that case.
{code}
  /**
   * Iterate through matching
   *  nodes and identify containers to preempt all on one node, also
   ** optimizing for least number of AM container preemptions. Only nodes
   ** that match the locality level specified in the {@link ResourceRequest}
   ** are considered. However, if this would lead to AM preemption, and 
locality
   ** relaxation is allowed, then the search space is expanded to the 
remaining
   ** nodes.
   *
   * @param rr
   * @param potentialNodes
   * @return
   */
  private PreemptableContainers getBestPreemptableContainers(ResourceRequest 
rr, List potentialNodes) {
PreemptableContainers bestContainers =
identifyContainersToPreemptForOneContainer(potentialNodes, rr);


if (rr.getRelaxLocality()
&& !ResourceRequest.isAnyLocation(rr.getResourceName())
&& bestContainers != null
&& bestContainers.numAMContainers > 0) {
  List remainingNodes = 
scheduler.getNodeTracker().getAllNodes();
  remainingNodes.removeAll(potentialNodes);
  PreemptableContainers spareContainers = 
identifyContainersToPreemptForOneContainer(remainingNodes, rr);
  if (spareContainers != null && spareContainers.numAMContainers < 
bestContainers.numAMContainers) {
bestContainers = spareContainers;
  }
}
return bestContainers;
  }
{code}


was (Author: yufeigu):
Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits:
# Can you rename two tests or write comments to clarify their intentions? I 
suppose the goals of two methods are: One can find less AM containers solution 
in the relax locations and the other can't.
# It is a good practice to put the callee methods under the caller methods.
# Can you refactor to create a new method like this? Please remember to 
reorganize the method java doc.  And we probably don't need 
the comment "// Don't preempt AM containers just to satisfy local requests if 
relax // locality is enabled." in that case.
{code}
  /**
   * Iterate through matching
   *  nodes and identify containers to preempt all on one node, also
   ** optimizing for least number of AM container preemptions. Only nodes
   ** that match the locality level specified in the {@link ResourceRequest}
   ** are considered. However, if this would lead to AM preemption, and 
locality
   ** relaxation is allowed, then the search space is expanded to the 
remaining
   ** nodes.
   *
   * @param rr
   * @param potentialNodes
   * @return
   */
  private PreemptableContainers getBestPreemptableContainers(ResourceRequest 
rr, List potentialNodes) {
PreemptableContainers bestContainers =
identifyContainersToPreemptForOneContainer(potentialNodes, rr);


if (rr.getRelaxLocality()
&& !ResourceRequest.isAnyLocation(rr.getResourceName())
&& bestContainers != null
&& bestContainers.numAMContainers > 0) {
  List remainingNodes = 
scheduler.getNodeTracker().getAllNodes();
  remainingNodes.removeAll(potentialNodes);
  PreemptableContainers spareContainers = 
identifyContainersToPreemptForOneContainer(remainingNodes, rr);
  if (spareContainers != null && spareContainers.numAMContainers < 
bestContainers.numAMContainers) {
bestContainers = spareContainers;
  }
}
return bestContainers;
  }
{code}

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-28 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702308#comment-16702308
 ] 

Yufei Gu commented on YARN-9041:


There is an error in test build which isn't related to your patch. I'll review 
later.
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork -> [Help 1]
{code}

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch, 
> YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-26 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699511#comment-16699511
 ] 

Yufei Gu commented on YARN-9041:


Hi [~jiwq], thanks for the patch. I like the idea to shrink the search space, 
and your patch v2 seems to solve the concern raised by [~Steven Rand]. However, 
it is necessary to provide a unit test case for the change. 

> Optimize FSPreemptionThread#identifyContainersToPreempt method
> --
>
> Key: YARN-9041
> URL: https://issues.apache.org/jira/browse/YARN-9041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9041.001.patch, YARN-9041.002.patch
>
>
> In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM 
> preemption, and locality relaxation is allowed, then the search space is 
> expanded to all nodes changed to the remaining nodes. The remaining nodes are 
> equal to all nodes minus the potential nodes.
> Judging condition changed to:
>  # rr.getRelaxLocality()
>  # !ResourceRequest.isAnyLocation(rr.getResourceName())
>  # bestContainers != null
>  # bestContainers.numAMContainers > 0
> If I understand the deviation, please criticize me. thx~



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9005) FairScheduler maybe preempt the AM container

2018-11-16 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690178#comment-16690178
 ] 

Yufei Gu commented on YARN-9005:


Hi [~jiwq], sorry for the late response. Please clarify what performance issue 
you want to fix while creating the new issue. Thanks.

> FairScheduler maybe preempt the AM container
> 
>
> Key: YARN-9005
> URL: https://issues.apache.org/jira/browse/YARN-9005
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9005.001.patch, YARN-9005.002.patch
>
>
> In the worst case, FS preempt the AM container. Due to 
> FSPreemptionThread#identifyContainersToPreempt return value contains AM 
> container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9005) FairScheduler maybe preempt the AM container

2018-11-09 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682243#comment-16682243
 ] 

Yufei Gu commented on YARN-9005:


It is by design that AM containers can be preempted. YARN-5830 did the 
improvement to reduce the chance of preempting AM containers. FS still preempts 
AM containers if that is the only option.

> FairScheduler maybe preempt the AM container
> 
>
> Key: YARN-9005
> URL: https://issues.apache.org/jira/browse/YARN-9005
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9005.001.patch
>
>
> In the worst case, FS preempt the AM container. Due to 
> FSPreemptionThread#identifyContainersToPreempt return value contains AM 
> container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9005) FairScheduler maybe preempt the AM container

2018-11-09 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-9005:
---
Component/s: scheduler preemption
 fairscheduler

> FairScheduler maybe preempt the AM container
> 
>
> Key: YARN-9005
> URL: https://issues.apache.org/jira/browse/YARN-9005
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler preemption
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-9005.001.patch
>
>
> In the worst case, FS preempt the AM container. Due to 
> FSPreemptionThread#identifyContainersToPreempt return value contains AM 
> container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-07 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678577#comment-16678577
 ] 

Yufei Gu commented on YARN-8978:


[~qiuliang988], not sure if you still need this jira but you shouldn't make it 
as "fixed".  Please make it as "invalid/won't fix" if you don't need it.

> For fair scheduler, application with higher priority should also get priority 
> resources for running AM
> --
>
> Key: YARN-8978
> URL: https://issues.apache.org/jira/browse/YARN-8978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: qiuliang
>Priority: Major
> Attachments: YARN-8978.001.patch
>
>
> In order to allow important applications to run earlier, we used priority 
> scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
> Considering this situation, there are two applications (with different 
> priorities) in the same queue and both are accepted. Both applications are 
> demanding and hungry when dispatched to the queue. Next, calculate the weight 
> ratio. Since the used resources of both applications are 0, the weight ratio 
> is also 0. The priority is invalid in this case. Low-priority applications 
> may get resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8969) Change the return type to generic type of AbstractYarnScheduler#getNodeTracker

2018-11-05 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675958#comment-16675958
 ] 

Yufei Gu commented on YARN-8969:


[~eepayne], it is probably fine in this case. {{AbstractYarnScheduler}} is a 
@Private @Unstable class.

> Change the return type to generic type of AbstractYarnScheduler#getNodeTracker
> --
>
> Key: YARN-8969
> URL: https://issues.apache.org/jira/browse/YARN-8969
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-8969.001.patch
>
>
> Some warning problems like:
> {quote}Unchecked assignment: 'java.util.List' to 
> 'java.util.List'.
>  Reason: 'scheduler.getNodeTracker()' has raw type, so result of 
> getNodesByResourceName is erased{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy

2018-09-27 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631207#comment-16631207
 ] 

Yufei Gu commented on YARN-8792:


[~HCOONa], I've added you as a contributor, so that you can assign these jiras 
to yourself.

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-09-06 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606480#comment-16606480
 ] 

Yufei Gu commented on YARN-7794:


[~jhung], the patch looks good to me.

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil Govindan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: YARN-7794-branch-2.001.patch, YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) Threads in SLS quit without logging exception

2018-08-24 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592032#comment-16592032
 ] 

Yufei Gu commented on YARN-8632:


+1 for the patch v4. Committed to trunk. Thanks [~luxianghao] for the patch.

> Threads in SLS quit without logging exception 
> --
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch, YARN-8632.004.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8632) Threads in SLS quit without logging exception

2018-08-24 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-8632:
---
Summary: Threads in SLS quit without logging exception   (was: No data in 
file realtimetrack.json after running SchedulerLoadSimulator)

> Threads in SLS quit without logging exception 
> --
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch, YARN-8632.004.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590570#comment-16590570
 ] 

Yufei Gu commented on YARN-8632:


[~luxianghao], thanks for the patch. Nice finding. +1 for the patch v3. Will 
commit later. Do you need a patch for 2.7?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-20 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586344#comment-16586344
 ] 

Yufei Gu commented on YARN-8632:


For that sake, we need to "setUncaughtExceptionHandler" for the thread, and 
provide a handler. Catching every exception in {{run()}} isn't enough. 

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2018-08-16 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583318#comment-16583318
 ] 

Yufei Gu commented on YARN-5139:


[~zhuqi], done.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2018-08-14 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-8655:
--

Assignee: Zhaohui Xin

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is fair share starved, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is starved by min share, so this app is 
> added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2018-08-14 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580139#comment-16580139
 ] 

Yufei Gu commented on YARN-8655:


Hi [~uranus],  added you to the contributor list and assigned this to you.

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is fair share starved, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is starved by min share, so this app is 
> added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-13 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578945#comment-16578945
 ] 

Yufei Gu commented on YARN-8632:


Thanks for the patch. Some comments:
# It is not a good practice to catch runtime exception. Normally we should let 
the program exit if a runtime exception happens. Create a new exception if you 
think it is necessary and throw and catch it explicitly.
# Use {{LOG.info("message", e)}} instead of {{e.printStackTrace();}} 
# {{(SchedulerWrapper)scheduler;}} needs a space before "scheduler"
# Create a unitest if possible.

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875
 ] 

Yufei Gu edited comment on YARN-8632 at 8/10/18 9:28 PM:
-

Your patch doesn't apply to trunk. You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?


was (Author: yufeigu):
Your patch doesn't apply to trunk? You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875
 ] 

Yufei Gu commented on YARN-8632:


Your patch doesn't apply to trunk? You said the bug is in trunk as well, can 
you provide a patch for the trunk?
What is the version does your patch target? 2.7.2?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-8632:
--

Assignee: Xianghao Lu

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-10 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1657#comment-1657
 ] 

Yufei Gu commented on YARN-8632:


Added you to the contributor list and assign this to you. Will review later.

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8639) Sort queue and apps in fair schduler using a separate thread

2018-08-10 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575926#comment-16575926
 ] 

Yufei Gu commented on YARN-8639:


Close it or take this chance to do a little bit test. Either way works for me.

> Sort queue and apps in fair schduler using a separate thread
> 
>
> Key: YARN-8639
> URL: https://issues.apache.org/jira/browse/YARN-8639
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: wan kun
>Priority: Minor
>
> If fair scheduler have many queue and each queue have many active 
> applications .
> For each assignContainer function, we need to sort all the queue, and all the 
> applications in each queue. For a large system,this may be cost too much 
> time. So we can sort the queue and applications using a separate thread 
> asynchronous.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8639) Sort queue and apps in fair schduler using a separate thread

2018-08-09 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575302#comment-16575302
 ] 

Yufei Gu commented on YARN-8639:


We need to quantify a little bit before we'd make any non-trivial change. How 
many sub-queues/applications will be considered as too many, and cause 
performance issue. The result won't only justify why we need the change but 
also provide the guideline to tuning the queue settings. 

> Sort queue and apps in fair schduler using a separate thread
> 
>
> Key: YARN-8639
> URL: https://issues.apache.org/jira/browse/YARN-8639
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: wan kun
>Priority: Minor
>
> If fair scheduler have many queue and each queue have many active 
> applications .
> For each assignContainer function, we need to sort all the queue, and all the 
> applications in each queue. For a large system,this may be cost too much 
> time. So we can sort the queue and applications using a separate thread 
> asynchronous.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6636) Fair Scheduler: respect node labels at resource request level

2018-08-07 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572303#comment-16572303
 ] 

Yufei Gu commented on YARN-6636:


There are multiple ways to approach node labeling in Fair Scheduler. The 
community doesn't have the consensus. The way YARN-2497 took heavily involves 
queue management or fair share calculations.  Whether node labeling want to 
affect queue management  is decided by whether we want fairness on the node 
labeling. Node labeling partitions the cluster resources. My take is we 
generally still need fairness on each partition, which is materialized by 
queue, fair share. However, some particular cases only require node labeling to 
act like data locality which doesn't need fairness.

> Fair Scheduler: respect node labels at resource request level
> -
>
> Key: YARN-6636
> URL: https://issues.apache.org/jira/browse/YARN-6636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>Priority: Major
>
> This ticket is to track changes to fair scheduler to respect node labels at 
> resource request level. When the client sets labels at resource request 
> level, the scheduler must schedule those containers only on those nodes with 
> that label. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8495) Priority scheduling support in FairShare scheduler

2018-07-05 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533902#comment-16533902
 ] 

Yufei Gu commented on YARN-8495:


Thanks [~Dillon.] for filing this. The implementation conflicts with the 
preemption inside a queue, which assumes each application splits the resource 
evenly, so that scheduler will kill the containers from the applications who 
use more resource than its fair share. In a case that a low priority app A uses 
less than its fair share and a high priority app B uses more than its fair 
share. This priority scheduling will assign containers to B, while the 
preemption will kill B's containers and intend to give them to A. A live lock 
happens.

> Priority scheduling support in FairShare scheduler
> --
>
> Key: YARN-8495
> URL: https://issues.apache.org/jira/browse/YARN-8495
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Reporter: Dillon Zhang
>Priority: Major
> Attachments: YARN-8495.001.patch
>
>
>  In production environment, priority scheduling is of vital importance to us 
> as we have lots of queues for different departments, then create applications 
> but some of them are not so important as others, so we must guarantee the 
> import ones to supply service.  
> Based on the priority of the application, Fair Scheduler should be able to 
> give preference to application while scheduling.
> Comparator applicationComparator can be changed as 
> below. 
> 1. Check for Application priority. If priority is available, then return the 
> highest priority job.
> 2. Otherwise continue with existing logic such as Fair Share comparison and 
> App ID comparison and then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526924#comment-16526924
 ] 

Yufei Gu commented on YARN-8468:


Sounds good to me. Thanks [~mrbillau].

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526869#comment-16526869
 ] 

Yufei Gu commented on YARN-8468:


[~bsteinbach] since you filed this jira and provided the patch, you have the 
responsibility to justify the motivation. However, I am OK with this feature.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-27 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525773#comment-16525773
 ] 

Yufei Gu commented on YARN-8468:


It seems a benign feature in terms of how it impact the existing 
functionalities. In that sense, +0 for the feature. 

I'm more curious about the motivation. Can you elaborate more on it? 
[~bsteinbach]. 

[~szegedim], I think [~bsteinbach] proposed "maxContainerResources" as a queue 
property.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8184) Too many metrics if containerLocalizer/ResourceLocalizationService uses ReadWriteDiskValidator

2018-06-22 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520773#comment-16520773
 ] 

Yufei Gu commented on YARN-8184:


Committed to trunk. Thanks for the review, [~haibochen].

> Too many metrics if containerLocalizer/ResourceLocalizationService uses 
> ReadWriteDiskValidator
> --
>
> Key: YARN-8184
> URL: https://issues.apache.org/jira/browse/YARN-8184
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8184.001.patch, YARN-8184.002.patch
>
>
> ContainerLocalizer or ResourceLocalizationService will use the 
> ReadWriteDiskValidator as its disk validator when it downloads files if we 
> configure the yarn.nodemanger.disk-validator to ReadWriteDiskValidator's 
> name. In that case, ReadWriteDiskValidator will create a metric item for each 
> directory localized, which will be too many metrics. We should let 
> ContainerLocalizer only use the basic disk validator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-13 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511503#comment-16511503
 ] 

Yufei Gu commented on YARN-8394:


Hi [~cheersyang],

Let me clarify a little bit. The code logic should be:
{code:java}
if "yarn.scheduler.capacity.node-locality-delay" is -1:
disable "yarn.scheduler.capacity.rack-locality-additional-delay"
{code}
So that, a user doesn't need to set it manually, which is suggested by the doc 
you added. Moreover, if the code logic had been there, we would just say that 
if you disable yarn.scheduler.capacity.node-locality-delay, you disable 
yarn.scheduler.capacity.rack-locality-additional-delay as well.
{quote}
Note, this feature should be disabled if YARN is deployed separately with the 
file system, as locality is meaningless. This can be done by setting 
`yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's 
locality constraint is ignored.
{quote}

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-12 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510625#comment-16510625
 ] 

Yufei Gu commented on YARN-8394:


LGTM, can you file a jira for the code change?

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps

2018-06-08 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506571#comment-16506571
 ] 

Yufei Gu commented on YARN-8406:


You can probably move your patch and test result to YARN-7467 since I've closed 
this one. It doesn't make sense to work in here. 

> Do the improvement to the FSLeafQueue about calculating fair share for apps
> ---
>
> Key: YARN-8406
> URL: https://issues.apache.org/jira/browse/YARN-8406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Critical
>  Labels: patch
> Attachments: YARN-7467-001.patch, test.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I want to help to do the improvement about that  FSLeafQueue unnecessarily 
> calls ComputeFairShares.computeShare().
> h1.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-08 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505822#comment-16505822
 ] 

Yufei Gu commented on YARN-8394:


Sounds good to me.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-07 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505726#comment-16505726
 ] 

Yufei Gu commented on YARN-8394:


bq. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` 
to `-1`
This should be done in code instead of letting user to do it by reading the 
doc. Sounds like another jira if it is not there.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps

2018-06-07 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504915#comment-16504915
 ] 

Yufei Gu commented on YARN-8406:


[~zhuqi], there is no need to create this subtask. You can ask [~templedf] if 
you can take the YARN-7467.

> Do the improvement to the FSLeafQueue about calculating fair share for apps
> ---
>
> Key: YARN-8406
> URL: https://issues.apache.org/jira/browse/YARN-8406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Critical
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I want to help to do the improvement about that  FSLeafQueue unnecessarily 
> calls ComputeFairShares.computeShare().
> h1.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps

2018-06-07 Thread Yufei Gu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu resolved YARN-8406.

Resolution: Duplicate

> Do the improvement to the FSLeafQueue about calculating fair share for apps
> ---
>
> Key: YARN-8406
> URL: https://issues.apache.org/jira/browse/YARN-8406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Critical
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I want to help to do the improvement about that  FSLeafQueue unnecessarily 
> calls ComputeFairShares.computeShare().
> h1.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-06 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503552#comment-16503552
 ] 

Yufei Gu commented on YARN-8394:


Make senses to me assuming that the Cloud solution still uses CS/FS as the 
scheduler. I guess some simple settings to let container run on any node will 
solve the issue. Besides, the trend is no YARN in Cloud solutions, which makes 
"delay logic" totally irrelevant. 

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-05 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502479#comment-16502479
 ] 

Yufei Gu commented on YARN-8394:


Hi [~cheersyang], thanks for filing this. Can you elaborate on this?
bq. we need to introduce how to compromise data locality in CS otherwise MR 
jobs are suffering.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Priority: Major
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

2018-06-05 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502076#comment-16502076
 ] 

Yufei Gu commented on YARN-5139:


[~zhuqi], you are welcome to contribute to Fair Scheduler. It's not a trivial 
effort to bring global scheduling to FS even with these jiras in. I strongly 
believe it is the right direction though. Let me know if you need any help.

> [Umbrella] Move YARN scheduler towards global scheduler
> ---
>
> Key: YARN-5139
> URL: https://issues.apache.org/jira/browse/YARN-5139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: Explanantions of Global Scheduling (YARN-5139) 
> Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, 
> YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, 
> YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, 
> wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to 
> sub-optimal decisions because scheduler can only look at one node at the time 
> when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>Go to parentQueue
>   Go to leafQueue
> for application in leafQueue.applications:
>for resource-request in application.resource-requests
>   try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node 
> constraints (give me "a && b || c") or anti-affinity (do not allocate HBase 
> regionsevers and Storm workers on the same host), we may need to consider 
> moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1797 matches

Mail list logo