[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136503#comment-17136503 ]
Adam Antal commented on YARN-9930: ---------------------------------- I was trying to make a meaningful review, but stuck on a few questions. Apologize if I'm making silly questions. I am a little nervous about this case: bq. Limit max-parallel-apps to 4, submit 4 apps, then refresh it to 2. Result: running apps were still running, but new apps stayed in Accepted state. From that point on, only 2 apps were allowed to run at the same time. So AFAIU it is absolutely normal that some queue is above its limit if the configurations have been changed. Doesn't it need some special attention in your algorithm when you recursively update the parents to search for queues where new apps could be submitted? I compared your implementation with the max apps one, it's a bit different. You use a separate {{CSMaxRunningAppsEnforcer}} instance in the scheduler which is optimized for guessing which queues to check whether their limits enabled more apps to run. The existing implementation for max apps (that considers both running and pending ones) calls the {{OrderingPolicy#getNumSchedulableEntities()}} and compare it the to limit inside {{LeafQueue}}. From the algorithm you described above I assume that your solution is more effective, but it seems to me that calling these methods of {{OrderingPolicy}} in {{LeafQueue#validateSubmitApplication}} already does similar things, but from the queue's perspective - while your solution is fundamentally implemented inside the scheduler. I'd prefer your solution as its more clear, but since we already have the existing logic, the questions arises: why do we need a separate enforcer object? Couldn't it be implemented similarly? Or am I missing something here? Nit: - {{abstract int getNumRunnableApps();}} would be better put into the {{CSQueue}} interface instead of {{AbstractCSQueue}} abstract class. > Support max running app logic for CapacityScheduler > --------------------------------------------------- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler > Affects Versions: 3.1.0, 3.1.1 > Reporter: zhoukang > Assignee: Peter Bacsko > Priority: Major > Attachments: YARN-9930-001.patch, YARN-9930-002.patch, > YARN-9930-003.patch, YARN-9930-004.patch, YARN-9930-POC01.patch, > YARN-9930-POC02.patch, YARN-9930-POC03.patch, YARN-9930-POC04.patch, > YARN-9930-POC05.patch, screenshot-1.png > > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org