[
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604144#comment-15604144
]
Sunil G edited comment on YARN-5773 at 10/25/16 4:46 AM:
---------------------------------------------------------
*Issues in Recovery of apps:*
1. activateApplications works under a write lock.
2. If one application is found of overflowing AM resource limit, instead of
breaking from loop, we continue and play complete apps from
pendingOrderingPolicy. We may need to iterate all apps because we have apps
belongs to different partition and pendingOrderingPolicy does not provide any
order for apps based on partition.
3. As mentioned by [~bibinchundatt], when each app fails to get activated due
to the upper cut of resource limit, one INFO log is emitted (because *amLimit*
is 0). During recovery, this is costly.
[~leftnoteasy] and [~rohithsharma]
bq.If a given app's AM resource amount > AM headroom, should we skip the AM and
activate following app which AM resource amount <= AM headroom?
bq.But one point to be considered is for each Node registration, head room
changes. So, user head room changes as new node registered. This need to be
taken care.
Currently activateApplications is invoked when there is a change in cluster
resource. So any change in cluster resource will ensure a call to
activateApplications and we can recalculate this headroom. I am not very sure
about the suggested map. Will this check be coming before we do the existing AM
resource percentage check for queue/partition (not user based) ? OR are we
replacing this checks?
was (Author: sunilg):
*Issues in Recovery of apps:*
1. activateApplications works under a write lock.
2. If one application is found of overflowing AM resource limit, instead of
breaking from loop, we continue and play complete apps from
pendingOrderingPolicy. We may need to iterate all apps because we have apps
belongs to different partition and pendingOrderingPolicy does not provide any
order for apps based on partition.
3. As mentioned by [~bibinchundatt], when each app fails to get activated due
to the upper cut of resource limit, one INFO log is emitted. During recovery,
this is costly.
[~leftnoteasy] and [~rohithsharma]
bq.If a given app's AM resource amount > AM headroom, should we skip the AM and
activate following app which AM resource amount <= AM headroom?
bq.But one point to be considered is for each Node registration, head room
changes. So, user head room changes as new node registered. This need to be
taken care.
Currently activateApplications is invoked when there is a change in cluster
resource. So any change in cluster resource will ensure a call to
activateApplications and we can recalculate this headroom. I am not very sure
about the suggested map. Will this check be coming before we do the existing AM
resource percentage check for queue/partition (not user based) ? OR are we
replacing this checks?
> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is
> invoked.Resulting in AM limit check to be done even before Node managers are
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}}
> application {{50000000}} iterations causing time take for Rm to be active
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip
> {{activateApplicaiton()}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]