[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755070#comment-15755070
 ] 

Sunil G commented on YARN-5889:
-------------------------------

Generally I am also agreeing with the direction at which we are going towards.

Few points from end:
- For preemption calculation, one of the main problem could have been about the 
*free resources* in the queue even when some users are over-utilizing its 
resource quota (these users could become active/non-active). Because preemption 
module will be handling {free_resources + to_be_preempted_resources} and need 
to think more like scheduler.
- Above point will play a big factor to decide when preemption need to kick in. 
It could be when free/used become very smaller OR it could also be when there 
is a lot of violation from few users which holds resource more than MULP but 
became non-active users.

As far as I understood, we will still have pre-computed user-limit model. But 
this cache will be computed based on any event change on resource changes for 
non-active users. I think in a busier and short-living app's cluster, we may 
recalculate more. But I think preemption module will have a better accuracy.

On this note, could I update a patch with approach mentioned above. I think 
free resource also need to be part to trigger preemption. But for user-limit 
calculation, I will be making changes in {{ActiveUserManager}} to track of 
non-active-users as well with a state to reflect changes in resource.

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to