[
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755070#comment-15755070
]
Sunil G commented on YARN-5889:
-------------------------------
Generally I am also agreeing with the direction at which we are going towards.
Few points from end:
- For preemption calculation, one of the main problem could have been about the
*free resources* in the queue even when some users are over-utilizing its
resource quota (these users could become active/non-active). Because preemption
module will be handling {free_resources + to_be_preempted_resources} and need
to think more like scheduler.
- Above point will play a big factor to decide when preemption need to kick in.
It could be when free/used become very smaller OR it could also be when there
is a lot of violation from few users which holds resource more than MULP but
became non-active users.
As far as I understood, we will still have pre-computed user-limit model. But
this cache will be computed based on any event change on resource changes for
non-active users. I think in a busier and short-living app's cluster, we may
recalculate more. But I think preemption module will have a better accuracy.
On this note, could I update a patch with approach mentioned above. I think
free resource also need to be part to trigger preemption. But for user-limit
calculation, I will be making changes in {{ActiveUserManager}} to track of
non-active-users as well with a state to reflect changes in resource.
> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch,
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with
> a write lock. To improve performance, this tickets is focussing on moving
> user-limit calculation out of heartbeat allocation flow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]