[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-02-11 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034902#comment-17034902
 ] 

Aihua Xu commented on YARN-6492:


[~maniraj...@gmail.com] Do you have update on this jira?

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10126) Use threadPool to handle async scheduling threads

2020-02-10 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-10126:

Parent: YARN-5139
Issue Type: Sub-task  (was: Improvement)

> Use threadPool to handle async scheduling threads
> -
>
> Key: YARN-10126
> URL: https://issues.apache.org/jira/browse/YARN-10126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.9.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> Currently, async scheduling launches individual threads to handle scheduling 
> requests. If there is any issues in such threads, the threads exit and no new 
> threads get relaunched. Then eventually all the threads die and won't handle 
> any new job scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10126) Use threadPool to handle async scheduling threads

2020-02-10 Thread Aihua Xu (Jira)
Aihua Xu created YARN-10126:
---

 Summary: Use threadPool to handle async scheduling threads
 Key: YARN-10126
 URL: https://issues.apache.org/jira/browse/YARN-10126
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 2.9.1
Reporter: Aihua Xu
Assignee: Aihua Xu


Currently, async scheduling launches individual threads to handle scheduling 
requests. If there is any issues in such threads, the threads exit and no new 
threads get relaunched. Then eventually all the threads die and won't handle 
any new job scheduling.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7649) RMContainer state transition exception after container update

2020-02-06 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031994#comment-17031994
 ] 

Aihua Xu commented on YARN-7649:


[~asuresh] Any update on this task?

> RMContainer state transition exception after container update
> -
>
> Key: YARN-7649
> URL: https://issues.apache.org/jira/browse/YARN-7649
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0
>Reporter: Weiwei Yang
>Assignee: Arun Suresh
>Priority: Major
>
> I've been seen this in a cluster deployment as well as in UT, run 
> {{TestAMRMClient#testAMRMClientWithContainerPromotion}} could reproduce this, 
>  it doesn't fail the test case but following error message is shown up in the 
> log
> {noformat}
> 2017-12-13 19:41:31,817 ERROR rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(480)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> RELEASED at ALLOCATED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:478)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:675)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1586)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:155)
>   at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>   at java.lang.Thread.run(Thread.java:748)
> 2017-12-13 19:41:31,817 ERROR rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(481)) - Invalid event RELEASED on container 
> container_1513165290804_0001_01_03
> {noformat}
> this seems to be related to YARN-6251.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file

2020-01-27 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024679#comment-17024679
 ] 

Aihua Xu commented on YARN-10015:
-

[~yufeigu] Can you help review and commit the patch? It's simple doc change. 
Thanks.

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10016) NPE is thrown when accessing SLS web portal

2019-12-06 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-10016:

Parent: YARN-5065
Issue Type: Sub-task  (was: Bug)

> NPE is thrown when accessing SLS web portal
> ---
>
> Key: YARN-10016
> URL: https://issues.apache.org/jira/browse/YARN-10016
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> The following NPE is thrown when running SLS and accessing 
> http://$HOST:10001/simulate
> {noformat}
> java.lang.NullPointerException
>   at 
> org.eclipse.jetty.server.ResourceService.doGet(ResourceService.java:235)
>   at 
> org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:256)
>   at org.apache.hadoop.yarn.sls.web.SLSWebApp$1.handle(SLSWebApp.java:159)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at org.eclipse.jetty.server.Server.handle(Server.java:494)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file

2019-12-06 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-10015:

Parent: YARN-5065
Issue Type: Sub-task  (was: Bug)

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10016) NPE is thrown when accessing SLS web portal

2019-12-06 Thread Aihua Xu (Jira)
Aihua Xu created YARN-10016:
---

 Summary: NPE is thrown when accessing SLS web portal
 Key: YARN-10016
 URL: https://issues.apache.org/jira/browse/YARN-10016
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Aihua Xu
Assignee: Aihua Xu


The following NPE is thrown when running SLS and accessing 
http://$HOST:10001/simulate

{noformat}
java.lang.NullPointerException
at 
org.eclipse.jetty.server.ResourceService.doGet(ResourceService.java:235)
at 
org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:256)
at org.apache.hadoop.yarn.sls.web.SLSWebApp$1.handle(SLSWebApp.java:159)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:494)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file

2019-12-06 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-10015:

Summary: Correct the sample command in SLS README file  (was: Correct SLS 
README sample command)

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10015) Correct SLS README sample command

2019-12-06 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990007#comment-16990007
 ] 

Aihua Xu commented on YARN-10015:
-

It's a simple fix. Just replace it with the normal dash.  

> Correct SLS README sample command
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10015) Correct SLS README sample command

2019-12-06 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-10015:

Attachment: YARN-10015.patch

> Correct SLS README sample command
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10015) Correct SLS README sample command

2019-12-06 Thread Aihua Xu (Jira)
Aihua Xu created YARN-10015:
---

 Summary: Correct SLS README sample command
 Key: YARN-10015
 URL: https://issues.apache.org/jira/browse/YARN-10015
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu


The sample command in SLS README {{bin/slsrun.sh 
—-input-rumen=sample-data/2jobs2min-rumen-jh.json —-output-dir=sample-output}} 
contains a dash from different encoding. The command will give the following 
exception. 

ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2019-09-25 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938133#comment-16938133
 ] 

Aihua Xu commented on YARN-9615:


This seems to be an important feature since sometimes we will see a large 
queue. The generic approach looks promising which can be adopted for other 
queues as well. 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.poc.patch, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-18 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932598#comment-16932598
 ] 

Aihua Xu commented on YARN-2255:


Thanks [~cheersyang]

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Aihua Xu
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-2255.1.patch, YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-12 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928711#comment-16928711
 ] 

Aihua Xu commented on YARN-2255:


[~cheersyang] Thanks Weiwei. Yes. I tested locally and it works well. Here is 
the sample output from the test. Can you help commit the change? 

aihuaxu-C02WW0RCHTDG:logs aihuaxu$ cat rm-audit.log
2019-09-12 09:31:18,871 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
IP=127.0.0.1OPERATION=Submit Application RequestTARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_1568248909028_0001QUEUENAME=default
2019-09-12 09:31:19,480 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_01  RESOURCE=QUEUENAME=default
2019-09-12 09:31:31,191 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
IP=127.0.0.1OPERATION=Register App Master   TARGET=ApplicationMasterService 
RESULT=SUCCESS  APPID=application_1568248909028_0001
APPATTEMPTID=appattempt_1568248909028_0001_01
2019-09-12 09:31:31,480 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_02  RESOURCE=QUEUENAME=default
2019-09-12 09:31:32,489 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_03  RESOURCE=QUEUENAME=default
2019-09-12 09:31:44,326 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_03  RESOURCE=QUEUENAME=default
2019-09-12 09:31:44,331 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_02  RESOURCE=QUEUENAME=default
2019-09-12 09:31:44,788 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_01  RESOURCE=QUEUENAME=default
2019-09-12 09:31:44,813 INFO resourcemanager.RMAuditLogger: USER=aihuaxu
OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
RESULT=SUCCESS  APPID=application_1568248909028_0001
aihuaxu-C02WW0RCHTDG:logs aihuaxu$ cat nm-audit.log
2019-09-12 09:31:20,263INFO nodemanager.NMAuditLogger: 
USER=appattempt_1568248909028_0001_01IP=127.0.0.1
OPERATION=Start Container Request   
TARGET=ContainerManageImplRESULT=SUCCESS
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_01
2019-09-12 09:31:31,626INFO nodemanager.NMAuditLogger: 
USER=appattempt_1568248909028_0001_01IP=127.0.0.1
OPERATION=Start Container Request   
TARGET=ContainerManageImplRESULT=SUCCESS
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_02
2019-09-12 09:31:32,787INFO nodemanager.NMAuditLogger: 
USER=appattempt_1568248909028_0001_01IP=127.0.0.1
OPERATION=Start Container Request   
TARGET=ContainerManageImplRESULT=SUCCESS
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_03
2019-09-12 09:31:44,305INFO nodemanager.NMAuditLogger: USER=aihuaxu 
OPERATION=Container Finished - SucceededTARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_03
2019-09-12 09:31:44,311INFO nodemanager.NMAuditLogger: USER=aihuaxu 
OPERATION=Container Finished - SucceededTARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_02
2019-09-12 09:31:44,777INFO nodemanager.NMAuditLogger: USER=aihuaxu 
OPERATION=Container Finished - SucceededTARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_01
2019-09-12 09:31:44,828INFO nodemanager.NMAuditLogger: 
USER=appattempt_1568248909028_0001_01IP=127.0.0.1OPERATION=Stop 
Container RequestTARGET=ContainerManageImplRESULT=SUCCESS
APPID=application_1568248909028_0001
CONTAINERID=container_1568248909028_0001_01_01

> YARN Audit logging not added to log4j.properties
> 

[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-10 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927239#comment-16927239
 ] 

Aihua Xu commented on YARN-2255:


Assign to myself.

[~wangda], [~cheersyang] Can you help review the change? It's logging 
configuration to be consistent with hdfs audit log.  Thanks.

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-2255.1.patch, YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-10 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned YARN-2255:
--

Assignee: Aihua Xu  (was: Ying Zhang)

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-2255.1.patch, YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-10 Thread Aihua Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-2255:
---
Attachment: YARN-2255.1.patch

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Ying Zhang
>Priority: Major
> Attachments: YARN-2255.1.patch, YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-10 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927114#comment-16927114
 ] 

Aihua Xu commented on YARN-2255:


[~Ying Zhang] I think it's a good idea to have a separate audit log file as 
hdfs file. I can rebase and try to get it committed if you are not working on 
it. 

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Ying Zhang
>Priority: Major
> Attachments: YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2019-09-10 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927090#comment-16927090
 ] 

Aihua Xu commented on YARN-2255:


[~Ying Zhang] Wondering why it's never getting committed. 

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Ying Zhang
>Priority: Major
> Attachments: YARN-2255.patch
>
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-27 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874316#comment-16874316
 ] 

Aihua Xu commented on YARN-6629:


[~cheersyang] for this particular issue, since it's already in 2.10, I think we 
don't need additional backport since 2.10 will be the next release on branch-2, 
is that correct?

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-25 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872902#comment-16872902
 ] 

Aihua Xu commented on YARN-6629:


Thanks. Just notice it's not included in 2.9.2 but it's in 2.10. 

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-25 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872701#comment-16872701
 ] 

Aihua Xu commented on YARN-6629:


Can we also backport this change to branch-2? It's critical since it's causing 
ResourceManager to crash.

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2019-06-25 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872665#comment-16872665
 ] 

Aihua Xu commented on YARN-8193:


Can we also push this patch to 2.9.x branch? It's causing RM to crash.

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -
>
> Key: YARN-8193
> URL: https://issues.apache.org/jira/browse/YARN-8193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8193-branch-2-001.patch, 
> YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch
>
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9478) Add timeout for renew delegation thread pool

2019-04-12 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9478:
--

 Summary: Add timeout for renew delegation thread pool
 Key: YARN-9478
 URL: https://issues.apache.org/jira/browse/YARN-9478
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Aihua Xu
Assignee: Aihua Xu


Yarn by default creates a thread pool with 50 threads to handle all the token 
renewal for the running jobs. Currently there is no timeout for the threads so 
if there is one application is slowing to renew token, then eventually Yarn 
could run into the situation that all the threads are busy with renewing tokens 
for such application types and the whole Yarn cluster can't handle new 
applications. 

Propose to add timeout to the threads in the thread pool so the threads get 
killed after certain time.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check

2019-04-10 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814573#comment-16814573
 ] 

Aihua Xu commented on YARN-9463:


Thanks a lot [~cheersyang] for your quick code review and submission.

> Add queueName info when failing with queue capacity sanity check
> 
>
> Key: YARN-9463
> URL: https://issues.apache.org/jira/browse/YARN-9463
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.9.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9463.1.patch
>
>
> In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue 
> capacity setting, (abs-capacity=0.00160782) > 
> (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name 
> so admin can identify the problematic queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check

2019-04-09 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813645#comment-16813645
 ] 

Aihua Xu commented on YARN-9463:


Simple fix: the error will print out queue info as well now. 

> Add queueName info when failing with queue capacity sanity check
> 
>
> Key: YARN-9463
> URL: https://issues.apache.org/jira/browse/YARN-9463
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.9.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-9463.1.patch
>
>
> In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue 
> capacity setting, (abs-capacity=0.00160782) > 
> (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name 
> so admin can identify the problematic queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9463) Add queueName info when failing with queue capacity sanity check

2019-04-09 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9463:
---
Attachment: YARN-9463.1.patch

> Add queueName info when failing with queue capacity sanity check
> 
>
> Key: YARN-9463
> URL: https://issues.apache.org/jira/browse/YARN-9463
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.9.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-9463.1.patch
>
>
> In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue 
> capacity setting, (abs-capacity=0.00160782) > 
> (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name 
> so admin can identify the problematic queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9463) Add queueName info when failing with queue capacity sanity check

2019-04-08 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9463:
--

 Summary: Add queueName info when failing with queue capacity 
sanity check
 Key: YARN-9463
 URL: https://issues.apache.org/jira/browse/YARN-9463
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 2.9.1
Reporter: Aihua Xu
Assignee: Aihua Xu


In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue 
capacity setting, (abs-capacity=0.00160782) > 
(abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name 
so admin can identify the problematic queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved YARN-9297.

Resolution: Duplicate

> Renaming RM could cause application to crash
> 
>
> Key: YARN-9297
> URL: https://issues.apache.org/jira/browse/YARN-9297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Priority: Major
>
> In this line, we are throwing UnknownHostException when any RM host can't 
> resolve to ip address. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448
> There are some cases that one RM needs to rename or map to different ip 
> address, then it will crash the application although other RMs are running 
> fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766618#comment-16766618
 ] 

Aihua Xu commented on YARN-9297:


Yes. You are right. I will resolve as dup. Thanks [~jojochuang]

> Renaming RM could cause application to crash
> 
>
> Key: YARN-9297
> URL: https://issues.apache.org/jira/browse/YARN-9297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Priority: Major
>
> In this line, we are throwing UnknownHostException when any RM host can't 
> resolve to ip address. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448
> There are some cases that one RM needs to rename or map to different ip 
> address, then it will crash the application although other RMs are running 
> fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9297:
--

 Summary: Renaming RM could cause application to crash
 Key: YARN-9297
 URL: https://issues.apache.org/jira/browse/YARN-9297
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Aihua Xu


In this line, we are throwing UnknownHostException when any RM host can't 
resolve to ip address. 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448

There are some cases that one RM needs to rename or map to different ip 
address, then it will crash the application although other RMs are running 
fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-31 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757740#comment-16757740
 ] 

Aihua Xu commented on YARN-9200:


[~sunilg] Since you worked on absolute resource configuration, wants to hear 
from you as well. Thanks.

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9200.draft
>
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-29 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755286#comment-16755286
 ] 

Aihua Xu commented on YARN-9200:


[~leftnoteasy], [~rohithsharma] and [~cheersyang] I'm trying to add the support 
to have separate percentage value for each resource (i.e., have an array 
instead of float value for each capacity). Seems there are much more change 
than I originally thought, especially the changes in QueueCapacities and the 
related ones. Before I move forward with such massive changes, I attached the 
draft and want to check with you folks if there is a better way. 

What I have done and I'm planning to do is: CapacitySchedulerConfiguration 
supports both "45" or "memory=80,vCores=20" and internally keeps an array - one 
value per resource; in QueueCapacities to map each label to an array of 
Capacities.

You feedback is appreciated.  


> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9200.draft
>
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-29 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9200:
---
Attachment: YARN-9200.draft

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9200.draft
>
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-24 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751380#comment-16751380
 ] 

Aihua Xu commented on YARN-9116:


Thanks [~cheersyang] and [~leftnoteasy] for your help.

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch, YARN-9116.5.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-24 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751359#comment-16751359
 ] 

Aihua Xu commented on YARN-9116:


The post commit is failing with the following exception randomly. There are a 
couple infra jiras for this already INFRA-13506, INFRA-17015. 

Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch, YARN-9116.5.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-23 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750490#comment-16750490
 ] 

Aihua Xu commented on YARN-9116:


Thanks [~cheersyang] for your valuable comment.

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch, YARN-9116.5.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-23 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.5.patch

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch, YARN-9116.5.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-23 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750291#comment-16750291
 ] 

Aihua Xu commented on YARN-9116:


patch-5: minor changes to address Weiwei's checkstyle issue. The UT failure 
passes locally and it's not related. 

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch, YARN-9116.5.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-23 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750275#comment-16750275
 ] 

Aihua Xu commented on YARN-9200:


[~leftnoteasy] and [~rohithsharma] I will spend time to take a look at the 
change. Let me know if you already worked on that. I couldn't find similar 
jira, btw.

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-22 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749056#comment-16749056
 ] 

Aihua Xu commented on YARN-9116:


patch-4: to address the comments from [~cheersyang]. I didn't try to correct 
checkstyle issues not related to the patch.

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-22 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.4.patch

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, 
> YARN-9116.4.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-22 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748929#comment-16748929
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] Thanks for the feedback. Those make sense. I will upload a new 
patch.

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9211) The yarn resourcemanager project keeps failing with " There was a timeout or other error in the fork" error

2019-01-18 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9211:
---
Description: 
Recently notice that the build keeps failing in resourcemanager project with " 
There was a timeout or other error in the fork". 

Here is the part of the log, but I don't see any UT failures. 

{noformat}
[WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7
[INFO] 
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:32 h
[INFO] Finished at: 2019-01-18T11:30:20+00:00
[INFO] Final Memory: 23M/773M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}

Here is a job https://builds.apache.org/job/PreCommit-YARN-Build/23107/console

  was:
Recently notice that the build keeps failing in resourcemanager project with " 
There was a timeout or other error in the fork". 

Here is the part of the log, but I don't see any UT failures. 

{noformat}
[WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7
[INFO] 
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:32 h
[INFO] Finished at: 2019-01-18T11:30:20+00:00
[INFO] Final Memory: 23M/773M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}


> The yarn resourcemanager project keeps failing with " There was a timeout or 
> other error in the fork" error
> ---
>
> Key: YARN-9211
> URL: https://issues.apache.org/jira/browse/YARN-9211
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.1.2
>Reporter: Aihua Xu
>Priority: Major
>
> Recently notice that the build keeps failing in resourcemanager project with 
> " There was a timeout or other error in the fork". 
> Here is the part of the log, but I don't see any UT failures. 
> {noformat}
> [WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 01:32 h
> [INFO] Finished at: 2019-01-18T11:30:20+00:00
> [INFO] Final Memory: 23M/773M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) 
> on project hadoop-yarn-server-resourcemanager: There was a timeout or other 
> error in the fork -> [Help 1]
> {noformat}
> Here is a job https://builds.apache.org/job/PreCommit-YARN-Build/23107/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9211) The yarn resourcemanager project keeps failing with " There was a timeout or other error in the fork" error

2019-01-18 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9211:
--

 Summary: The yarn resourcemanager project keeps failing with " 
There was a timeout or other error in the fork" error
 Key: YARN-9211
 URL: https://issues.apache.org/jira/browse/YARN-9211
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 3.1.2
Reporter: Aihua Xu


Recently notice that the build keeps failing in resourcemanager project with " 
There was a timeout or other error in the fork". 

Here is the part of the log, but I don't see any UT failures. 

{noformat}
[WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7
[INFO] 
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:32 h
[INFO] Finished at: 2019-01-18T11:30:20+00:00
[INFO] Final Memory: 23M/773M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-18 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746530#comment-16746530
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] The timeout issue seems not related since I'm seeing the failure 
from other build like 
https://builds.apache.org/job/PreCommit-YARN-Build/23108/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
or 
https://builds.apache.org/job/PreCommit-YARN-Build/23107/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt

Can you take another look at the latest patch? I will file a separate jira to 
fix the build issue. Right now I haven't found out why it keeps failing. Let me 
know if you have any thoughts.


> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance

2019-01-17 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Summary: Capacity Scheduler: implements queue level maximum-allocation 
inheritance  (was: Capacity Scheduler: add the default maximum-allocation-mb 
and maximum-allocation-vcores for the queues)

> Capacity Scheduler: implements queue level maximum-allocation inheritance
> -
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-17 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.3.patch

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-16 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744537#comment-16744537
 ] 

Aihua Xu commented on YARN-9116:


Checked the UT failure but don't see explicit test case failures but the build 
indicates a timeout. [~cheersyang] Can you tell which one is causing the 
timeout from the log? Otherwise, I can attach a new patch to test out.

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-16 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744457#comment-16744457
 ] 

Aihua Xu commented on YARN-9116:


I will take a look at UT failure [~cheersyang] 

I was debating if I should make it backward compatible for that fairly new 
feature. I will add that. Another thing: in this patch, I just silently call 
{{maximumAllocation = Resources.componentwiseMin(queueMax, clusterMax);}} to 
get queue level maximum-allocation. Should we fail with some exception if 
queueMax > clusterMax and let the admin fix the configuration? That seems to be 
what it was.

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-15 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743284#comment-16743284
 ] 

Aihua Xu commented on YARN-9200:


Thanks [~leftnoteasy] Good to know it's the right direction. :)  
[~rohithsharma] Let me know if you are actively working on this.

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-14 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9200:
---
Description: 
In capacity scheduler, currently two resource allocations are supported. 1. 
percentage allocation for child queues -  the child queue gets a defined 
percentage of the resources for all the resource types; 2. absolute values  
(YARN-5881) - each resource is configured an absolute values.

Right now we can't mix these case together and it would also very confusing to 
mix them in one cluster. The second case actually is more targeting toward 
cloud env. 

In a non-cloud env, the ability to configure each resource independently is 
also useful, but percentage is preferable over absolute value. One thought here 
is to add the percentage configuration for each resource type on the queue. 
That would allow us to configure memory bounded queues, or CPU bounded queues. 
We can also keep backward compatible: each resource type just gets the same 
percentage if no percentage is configured for individual resource type.



  was:
In capacity scheduler, currently two resource allocations are supported. 1. 
percentage allocation for child queues -  the child queue gets a defined 
percentage of the resources for all the resource types; 2. absolute values  
(YARN-5881) - each resource is configured an absolute values.

Right now we can't mix these case together and it would also very confusing to 
mix them in one cluster. The second case actually is more targeting toward 
cloud env. 

In a non-cloud env, the ability to configure each resource independently is 
also useful, but percentage is preferable over absolute value. One thought here 
is to add the percentage configuration for each resource type on the queue. 
That would allow us to configure memory bounded queues, or CPU bounded queues. 




> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-14 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned YARN-9200:
--

Assignee: Aihua Xu

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. We can also keep backward compatible: each resource type just gets 
> the same percentage if no percentage is configured for individual resource 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-14 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9200:
---
Description: 
In capacity scheduler, currently two resource allocations are supported. 1. 
percentage allocation for child queues -  the child queue gets a defined 
percentage of the resources for all the resource types; 2. absolute values  
(YARN-5881) - each resource is configured an absolute values.

Right now we can't mix these case together and it would also very confusing to 
mix them in one cluster. The second case actually is more targeting toward 
cloud env. 

In a non-cloud env, the ability to configure each resource independently is 
also useful, but percentage is preferable over absolute value. One thought here 
is to add the percentage configuration for each resource type on the queue. 
That would allow us to configure memory bounded queues, or CPU bounded queues. 



  was:
In capacity scheduler, currently two resource allocations are supported. 1. 
percentage allocation for child queues -  the child queue gets a defined 
percentage of the resources for all the resource types; 2. absolute values  
(YARN-5881) - each resource is configured an absolute values.

Right now we can't mix these case together and it would also very confusing to 
mix them in one cluster. The second case actually is more targeting toward 
cloud env. 

In a non-cloud env, the ability to configure each resource independently is 
also useful, but in such env, percentage is preferable instead of absolute 
value. One thought here is to add the percentage configuration for each 
resource type on the queue. That would allow us to configure memory bounded 
queues, or CPU bounded queues. 




> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but percentage is preferable over absolute value. One thought 
> here is to add the percentage configuration for each resource type on the 
> queue. That would allow us to configure memory bounded queues, or CPU bounded 
> queues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-14 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742626#comment-16742626
 ] 

Aihua Xu commented on YARN-9200:


[~wangda], [~sunilg],  [~cheersyang] Want to hear your thoughts. Would that 
improvement work? 

> Enable resource configuration of queue capacity for different resources 
> independently
> -
>
> Key: YARN-9200
> URL: https://issues.apache.org/jira/browse/YARN-9200
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Aihua Xu
>Priority: Major
>
> In capacity scheduler, currently two resource allocations are supported. 1. 
> percentage allocation for child queues -  the child queue gets a defined 
> percentage of the resources for all the resource types; 2. absolute values  
> (YARN-5881) - each resource is configured an absolute values.
> Right now we can't mix these case together and it would also very confusing 
> to mix them in one cluster. The second case actually is more targeting toward 
> cloud env. 
> In a non-cloud env, the ability to configure each resource independently is 
> also useful, but in such env, percentage is preferable instead of absolute 
> value. One thought here is to add the percentage configuration for each 
> resource type on the queue. That would allow us to configure memory bounded 
> queues, or CPU bounded queues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9200) Enable resource configuration of queue capacity for different resources independently

2019-01-14 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9200:
--

 Summary: Enable resource configuration of queue capacity for 
different resources independently
 Key: YARN-9200
 URL: https://issues.apache.org/jira/browse/YARN-9200
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 3.1.0
Reporter: Aihua Xu


In capacity scheduler, currently two resource allocations are supported. 1. 
percentage allocation for child queues -  the child queue gets a defined 
percentage of the resources for all the resource types; 2. absolute values  
(YARN-5881) - each resource is configured an absolute values.

Right now we can't mix these case together and it would also very confusing to 
mix them in one cluster. The second case actually is more targeting toward 
cloud env. 

In a non-cloud env, the ability to configure each resource independently is 
also useful, but in such env, percentage is preferable instead of absolute 
value. One thought here is to add the percentage configuration for each 
resource type on the queue. That would allow us to configure memory bounded 
queues, or CPU bounded queues. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-14 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.2.patch

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-11 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: (was: YARN-9116.2.patch)

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-11 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.2.patch

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch, YARN-9116.2.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-09 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738441#comment-16738441
 ] 

Aihua Xu commented on YARN-9116:


Thanks [~leftnoteasy] and [~cheersyang] for the suggestion. Just to clarify the 
behavior:  the queue's maximum-allocation will not exceed the global setting 
(yarn.scheduler.capacity.maximum-allocation-mb) to keep the compatibility; 
among the queues,  the child inherits the setting from the parent and the child 
queue can override the parent queue with larger or smaller setting but still 
respecting the global setting.

That sounds reasonable to me. I will work on the support of 
maximum-allocation-mb/vcores to parent queues first and have a follow up on the 
general resource type. 









> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-08 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737351#comment-16737351
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang]  As I understand currently you can only set maximum-allocation-mb 
on the leaf queue, not intermediate parent queues. 

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-07 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736083#comment-16736083
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] In the example above, we are still introducing the 
incompatibility change since the root is set to 16G 
{{yarn.scheduler.capacity.root.maximum-allocation-mb=16G}} while the child 
queue root.large is set to 80G (larger value). Do you think it's OK change? 
What you are proposing is: the child can override the parent's value (larger or 
smaller) but won't exceed the global value, correct? 


> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-04 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734369#comment-16734369
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] That was my initial idea (see YARN-9055) that we can override the 
parent setting, but it introduces incompatibility since it's always assumed 
that the child queue can't have larger settings than the parents. Some clients 
such as spark will check the top settings and fail immediately if the resource 
request can't be satisfied.

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2019-01-02 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732283#comment-16732283
 ] 

Aihua Xu commented on YARN-9116:


Thanks [~cheersyang] for the comment. Happy new year.

So you are suggesting the following, is that correct? Actually that would 
introduce many queue level configuration if we don't introduce new property 
even with such inheritance. Even after we implement inheritance mechanism, we 
have to set the global to be 120G/256vCores (the maximum value allowed in the 
cluster) and then override all the top queues to be 16G/16vCores and set the 
larger container top queue to 120G/256vCores. I feel the current approach is 
simpler and straightforward. Let me know if you think the inheritance 
implementation is still needed, but seems we do need to add additional 
configuration.

{noformat}
Queue level max inherits the value from its parent if it is not explicitly set
If queue level max is set explicitly, then it is honored without considering 
its parents
{noformat}


> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725466#comment-16725466
 ] 

Aihua Xu edited comment on YARN-9116 at 12/20/18 12:41 AM:
---

Patch-1: in this patch, add the simple logic to give the default memory/vcore 
values to the queues if no configuration is set for such queues. A new 
configuration "yarn.scheduler.capacity.default-queue-maximum-allocation" is 
added to set the queue default for maximum allocation.

I didn't implement queue inheritance since feel this would keep the 
configuration simpler. Let me know if it's needed and I can do that in the 
followup.




was (Author: aihuaxu):
Patch-1: in this patch, add the simple logic to give the default memory/vcore 
values to the queues if no configuration is set for such queues. A new 
configuration "yarn.scheduler.capacity.default-queue-maximum-allocation" is 
added to set the queue default for maximum allocation in the configuration. 



> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Attachment: YARN-9116.1.patch

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9116.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-14 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721603#comment-16721603
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] Yes. Agree with you. I will implement toward such goal.

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-13 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720919#comment-16720919
 ] 

Aihua Xu edited comment on YARN-9116 at 12/14/18 4:52 AM:
--

[~cheersyang] As I understand from the code, the logic to inherit from the 
parent is not implemented yet. We need to set such properties on the leaf 
queues. That is another approach I'm also thinking of. 

And also, we may not want the child queue to have larger value than the parent 
queue (by following the current behavior that queue value will not be larger 
than global level), then we may not set on the root queue, but on the children 
of the root queue.  


was (Author: aihuaxu):
[~cheersyang] As I understand from the code, the logic to inherit from the 
parent is not implemented yet. We need to set such properties on the leaf 
queues. That is another approach I'm also thinking of but need more work. 

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-13 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720919#comment-16720919
 ] 

Aihua Xu commented on YARN-9116:


[~cheersyang] As I understand from the code, the logic to inherit from the 
parent is not implemented yet. We need to set such properties on the leaf 
queues. That is another approach I'm also thinking of but need more work. 

> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-13 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720436#comment-16720436
 ] 

Aihua Xu commented on YARN-9116:


Thanks [~leftnoteasy]. I will look into what you mentioned above. Sounds like a 
good implementation to handle different resource types. Originally I was only 
thinking of memory and vCores.

[~cheersyang] What I'm trying to achieve is to configure a larger container 
queue. To my understanding from the implementation of YARN-1582, we have to do 
the following steps:
# Configure the global maximum-allocation to 120G/256vCores
# Configure regular queues to 16G/16vCores or desired values
# Configure larger container queue to 120G/256vCores

The default queue-default I'm talking about is just to set to 16G/16vCores in 
this case. Without such default value, you have to set for all the queues. This 
is just the default value and you can set the desired one if the queue need a 
different value. 



> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Description: 
YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
which is targeting to support larger container features on dedicated queues 
(larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While 
to achieve larger container configuration, we need to increase the global 
maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then 
override those configurations with desired values on the queues since queue 
configuration can't be larger than cluster configuration. There are many queues 
in the system and if we forget to configure such values when adding a new 
queue, then such queue gets default 120G/256 which typically is not what we 
want.  

We can come up with a queue-default configuration (set to normal queue 
configuration like 16G/8), so the leaf queues gets such values by default.





  was:
YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
which is targeting to support larger container features on dedicated queues 
(larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While 
to achieve larger container configuration, we need to increase the global 
maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then 
override those configurations with desired values on the queues since queue 
configuration can't be larger than cluster configuration. If we forget to 
configure such values when adding a new queue, then such queue gets default 
120G/256 which typically is not what we want.  

We can come up with a queue-default configuration (set to normal queue 
configuration like 16G/8), so the leaf queues gets such values by default.






> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. There are 
> many queues in the system and if we forget to configure such values when 
> adding a new queue, then such queue gets default 120G/256 which typically is 
> not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9116:
---
Description: 
YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
which is targeting to support larger container features on dedicated queues 
(larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While 
to achieve larger container configuration, we need to increase the global 
maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then 
override those configurations with desired values on the queues since queue 
configuration can't be larger than cluster configuration. If we forget to 
configure such values when adding a new queue, then such queue gets default 
120G/256 which typically is not what we want.  

We can come up with a queue-default configuration (set to normal queue 
configuration like 16G/8), so the leaf queues gets such values by default.





  was:
YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
which is targeting to support larger container features on dedicated queues 
(larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While 
to achieve larger container configuration, we need to increase the global 
maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then 
override those configurations with desired values on the queues since queue 
configuration can't be larger than cluster configuration. If we forget to 
configure such values when adding a new queue, then such queue gets default 
120G/256 which typically is not what we want.  

We can come up with a top-queue-default configuration (set to normal queue 
configuration like 16G/8), so top queue and its children gets such values by 
default.






> Capacity Scheduler: add the default maximum-allocation-mb and 
> maximum-allocation-vcores for the queues
> --
>
> Key: YARN-9116
> URL: https://issues.apache.org/jira/browse/YARN-9116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
> which is targeting to support larger container features on dedicated queues 
> (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . 
> While to achieve larger container configuration, we need to increase the 
> global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and 
> then override those configurations with desired values on the queues since 
> queue configuration can't be larger than cluster configuration. If we forget 
> to configure such values when adding a new queue, then such queue gets 
> default 120G/256 which typically is not what we want.  
> We can come up with a queue-default configuration (set to normal queue 
> configuration like 16G/8), so the leaf queues gets such values by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9115) Capacity Scheduler: larger container configuration improvement on the queue level

2018-12-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9115:
---
Description: We are trying to use the feature introduced from YARN-1582 to 
configure one larger container queue while we are seeing some issues or 
inconvenience. Use this jira to track the tasks for the improvement. 

> Capacity Scheduler: larger container configuration improvement on the queue 
> level
> -
>
> Key: YARN-9115
> URL: https://issues.apache.org/jira/browse/YARN-9115
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> We are trying to use the feature introduced from YARN-1582 to configure one 
> larger container queue while we are seeing some issues or inconvenience. Use 
> this jira to track the tasks for the improvement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-12-12 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719521#comment-16719521
 ] 

Aihua Xu commented on YARN-9055:


We have the assumption that queue configuration won't be greater than cluster 
configuration and some clients are using that to fail earlier by comparing the 
requested resources against the cluster configuration (e.g. the spark 
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L177).
 

We can't simply remove such check.

> Capacity Scheduler: allow larger queue level maximum-allocation-mb to 
> override the cluster configuration
> 
>
> Key: YARN-9055
> URL: https://issues.apache.org/jira/browse/YARN-9055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9055.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
> That feature gives the flexibility to give different memory requirements for 
> different queues. Such patch adds the limitation that the queue level 
> configuration can't exceed the cluster level default configuration, but I 
> feel it may make more sense to remove such limitation to allow any overrides 
> since 
> # Such configuration is controlled by the admin so it shouldn't get abused; 
> # It's common that typical queues require standard size containers while some 
> job (queues) have requirements for larger containers. With current 
> limitation, we have to set larger configuration on the cluster setting which 
> will cause resource abuse unless we override them on all the queues.
> We can remove such limitation in CapacitySchedulerConfiguration.java so the 
> cluster setting provides the default value and queue setting can override it. 
> {noformat}
>if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
> || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
>   throw new IllegalArgumentException(
>   "Queue maximum allocation cannot be larger than the cluster setting"
>   + " for queue " + queue
>   + " max allocation per queue: " + result
>   + " cluster setting: " + clusterMax);
> }
> {noformat}
> Let me know if it makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues

2018-12-12 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9116:
--

 Summary: Capacity Scheduler: add the default maximum-allocation-mb 
and maximum-allocation-vcores for the queues
 Key: YARN-9116
 URL: https://issues.apache.org/jira/browse/YARN-9116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Affects Versions: 2.7.0
Reporter: Aihua Xu
Assignee: Aihua Xu


YARN-1582 adds the support of maximum-allocation-mb configuration per queue 
which is targeting to support larger container features on dedicated queues 
(larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While 
to achieve larger container configuration, we need to increase the global 
maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then 
override those configurations with desired values on the queues since queue 
configuration can't be larger than cluster configuration. If we forget to 
configure such values when adding a new queue, then such queue gets default 
120G/256 which typically is not what we want.  

We can come up with a top-queue-default configuration (set to normal queue 
configuration like 16G/8), so top queue and its children gets such values by 
default.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9115) Capacity Scheduler: larger container configuration improvement on the queue level

2018-12-12 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9115:
--

 Summary: Capacity Scheduler: larger container configuration 
improvement on the queue level
 Key: YARN-9115
 URL: https://issues.apache.org/jira/browse/YARN-9115
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 2.7.0
Reporter: Aihua Xu
Assignee: Aihua Xu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-12-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9055:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-9115

> Capacity Scheduler: allow larger queue level maximum-allocation-mb to 
> override the cluster configuration
> 
>
> Key: YARN-9055
> URL: https://issues.apache.org/jira/browse/YARN-9055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9055.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
> That feature gives the flexibility to give different memory requirements for 
> different queues. Such patch adds the limitation that the queue level 
> configuration can't exceed the cluster level default configuration, but I 
> feel it may make more sense to remove such limitation to allow any overrides 
> since 
> # Such configuration is controlled by the admin so it shouldn't get abused; 
> # It's common that typical queues require standard size containers while some 
> job (queues) have requirements for larger containers. With current 
> limitation, we have to set larger configuration on the cluster setting which 
> will cause resource abuse unless we override them on all the queues.
> We can remove such limitation in CapacitySchedulerConfiguration.java so the 
> cluster setting provides the default value and queue setting can override it. 
> {noformat}
>if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
> || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
>   throw new IllegalArgumentException(
>   "Queue maximum allocation cannot be larger than the cluster setting"
>   + " for queue " + queue
>   + " max allocation per queue: " + result
>   + " cluster setting: " + clusterMax);
> }
> {noformat}
> Let me know if it makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-11-27 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701228#comment-16701228
 ] 

Aihua Xu commented on YARN-9055:


Thanks [~tgraves] for the comment. Definitely it will introduce different 
behaviors. 

In the jira YARN-1582, we were trying to address the issues that some 
applications may request larger containers. How will you achieve that with 
minimal configuration? What I can think of is: you have to increase the cluster 
configuration and override on the queue level which doesn't require larger 
containers. 

> Capacity Scheduler: allow larger queue level maximum-allocation-mb to 
> override the cluster configuration
> 
>
> Key: YARN-9055
> URL: https://issues.apache.org/jira/browse/YARN-9055
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9055.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
> That feature gives the flexibility to give different memory requirements for 
> different queues. Such patch adds the limitation that the queue level 
> configuration can't exceed the cluster level default configuration, but I 
> feel it may make more sense to remove such limitation to allow any overrides 
> since 
> # Such configuration is controlled by the admin so it shouldn't get abused; 
> # It's common that typical queues require standard size containers while some 
> job (queues) have requirements for larger containers. With current 
> limitation, we have to set larger configuration on the cluster setting which 
> will cause resource abuse unless we override them on all the queues.
> We can remove such limitation in CapacitySchedulerConfiguration.java so the 
> cluster setting provides the default value and queue setting can override it. 
> {noformat}
>if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
> || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
>   throw new IllegalArgumentException(
>   "Queue maximum allocation cannot be larger than the cluster setting"
>   + " for queue " + queue
>   + " max allocation per queue: " + result
>   + " cluster setting: " + clusterMax);
> }
> {noformat}
> Let me know if it makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-11-26 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated YARN-9055:
---
Attachment: YARN-9055.1.patch

> Capacity Scheduler: allow larger queue level maximum-allocation-mb to 
> override the cluster configuration
> 
>
> Key: YARN-9055
> URL: https://issues.apache.org/jira/browse/YARN-9055
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: YARN-9055.1.patch
>
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
> That feature gives the flexibility to give different memory requirements for 
> different queues. Such patch adds the limitation that the queue level 
> configuration can't exceed the cluster level default configuration, but I 
> feel it may make more sense to remove such limitation to allow any overrides 
> since 
> # Such configuration is controlled by the admin so it shouldn't get abused; 
> # It's common that typical queues require standard size containers while some 
> job (queues) have requirements for larger containers. With current 
> limitation, we have to set larger configuration on the cluster setting which 
> will cause resource abuse unless we override them on all the queues.
> We can remove such limitation in CapacitySchedulerConfiguration.java so the 
> cluster setting provides the default value and queue setting can override it. 
> {noformat}
>if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
> || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
>   throw new IllegalArgumentException(
>   "Queue maximum allocation cannot be larger than the cluster setting"
>   + " for queue " + queue
>   + " max allocation per queue: " + result
>   + " cluster setting: " + clusterMax);
> }
> {noformat}
> Let me know if it makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-11-26 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699460#comment-16699460
 ] 

Aihua Xu commented on YARN-9055:


[~jlowe], [~leftnoteasy] and [~tgraves] Can you guys have any opinions on this? 

> Capacity Scheduler: allow larger queue level maximum-allocation-mb to 
> override the cluster configuration
> 
>
> Key: YARN-9055
> URL: https://issues.apache.org/jira/browse/YARN-9055
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
> That feature gives the flexibility to give different memory requirements for 
> different queues. Such patch adds the limitation that the queue level 
> configuration can't exceed the cluster level default configuration, but I 
> feel it may make more sense to remove such limitation to allow any overrides 
> since 
> # Such configuration is controlled by the admin so it shouldn't get abused; 
> # It's common that typical queues require standard size containers while some 
> job (queues) have requirements for larger containers. With current 
> limitation, we have to set larger configuration on the cluster setting which 
> will cause resource abuse unless we override them on all the queues.
> We can remove such limitation in CapacitySchedulerConfiguration.java so the 
> cluster setting provides the default value and queue setting can override it. 
> {noformat}
>if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
> || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
>   throw new IllegalArgumentException(
>   "Queue maximum allocation cannot be larger than the cluster setting"
>   + " for queue " + queue
>   + " max allocation per queue: " + result
>   + " cluster setting: " + clusterMax);
> }
> {noformat}
> Let me know if it makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration

2018-11-26 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9055:
--

 Summary: Capacity Scheduler: allow larger queue level 
maximum-allocation-mb to override the cluster configuration
 Key: YARN-9055
 URL: https://issues.apache.org/jira/browse/YARN-9055
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Aihua Xu
Assignee: Aihua Xu


YARN-1582 adds the support of maximum-allocation-mb configuration per queue. 
That feature gives the flexibility to give different memory requirements for 
different queues. Such patch adds the limitation that the queue level 
configuration can't exceed the cluster level default configuration, but I feel 
it may make more sense to remove such limitation to allow any overrides since 
# Such configuration is controlled by the admin so it shouldn't get abused; 
# It's common that typical queues require standard size containers while some 
job (queues) have requirements for larger containers. With current limitation, 
we have to set larger configuration on the cluster setting which will cause 
resource abuse unless we override them on all the queues.

We can remove such limitation in CapacitySchedulerConfiguration.java so the 
cluster setting provides the default value and queue setting can override it. 

{noformat}
   if (maxAllocationMbPerQueue > clusterMax.getMemorySize()
|| maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) {
  throw new IllegalArgumentException(
  "Queue maximum allocation cannot be larger than the cluster setting"
  + " for queue " + queue
  + " max allocation per queue: " + result
  + " cluster setting: " + clusterMax);
}
{noformat}

Let me know if it makes sense.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org