[
https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307191#comment-17307191
]
Michael Zeoli edited comment on YARN-6538 at 3/23/21, 4:03 PM:
---------------------------------------------------------------
Eric - thanks for the response and apologies for the absence. Currently we
have not been able to reproduce outside of our particular pipeline, though we
stopped in earnest once our platform vendor indicated they were able to
reproduce with a purpose-built MR job (we are currently working the issue with
them). I will try to get details.
Essentially what we see is a single job (in lq1) with several thousand pending
containers taking the entire cluster (expected, via dynamic allocation). When
a second job enters lq2, it fails to receive executors despite having a
guaranteed minimum capacity of 17% (approx 4 cores.. 28 * 0.95 * 0.17). On
occasion it also fails to receive an AM. If a third job enters lq3 at this
point, it also fails to receive executors. The jobs continue to starve until
the first job begins attriting resources as pending containers fall to zero.
YARN Resources (4 NM's, so 280 GiB / 28c total YARN resources)
* yarn.nodemanager.resource.cpu-vcores = 7
* yarn.scheduler.maximum-allocation-vcores = 7
* yarn.nodemanager.resource.memory-mb = 70 GiB
* yarn.scheduler.maximum-allocation-mb = 40 GiB
Queue configuration (note that only lq1, lq2 and lq3 are used in the current
tests)
* root.default cap = 5%
* root.tek cap = 95%
* root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
* root.tek.lq5 .lq6 cap = 16% each
For all lqN (leaf queues):
* Minimum User Limit = 25%
* User Limit Factor = 100 (intentionally set high to allow user to exceed
queue capacity when idle capacity exists)
* max cap = 100%
* max AM res limit = 20%
* inter / intra queue preemption: Enabled
* ordering policy = Fair
Spark config (this is our default spark config, though some of the spark jobs
in the pipelines we're testing set executor mem and overhead mem higher to
support more memory intensive work. Our work is memory constrained, and
additional cores per executor have never yielded more optimal throughput).
* spark.executor.cores=1
* spark.executor.memory=5G
* spark.driver.memory=4G
* spark.driver.maxResultSize=2G
* spark.executor.memoryOverhead=1024
* spark.dynamicAllocation.enabled = true
was (Author: novaboy):
Eric - thanks for the response and apologies for the absence. Currently we
have not been able to reproduce outside of our particular pipeline, though we
stopped in earnest once our platform vendor indicated they were able to
reproduce with a purpose-built MR job (we are currently working the issue with
them). I will try to get details.
Essentially what we see is a single job (in lq1) with several thousand pending
containers taking the entire cluster (expected, via dynamic allocation). When
a second job enters lq2, it fails to receive executors despite having a
guaranteed minimum capacity of 17% (approx 4 cores.. 28 * 0.95 * 0.17). On
occasion it also fails to receive an AM. If a third job enters lq3 at this
point, it also fails to receive executors. The jobs continue to starve until
the first job begins attriting resources as pending containers fall to zero.
YARN Resources (4 NM's, so 280 GiB / 28c total YARN resources)
* yarn.nodemanager.resource.cpu-vcores = 7
* yarn.scheduler.maximum-allocation-vcores = 7
* yarn.nodemanager.resource.memory-mb = 70 GiB
* yarn.scheduler.maximum-allocation-mb = 40 GiB
Queue configuration (note that only lq1, lq2 and lq3 are used in the current
tests)
* root.default cap = 5%
* root.tek cap = 95%
* root.tek.lq1, .lq2, .lq3, .lq4 cap = 17% each
* root.tek.lq5 .lq6 cap = 16% each
For all lqN (leaf queues):
* Minimum User Limit = 25%
* User Limit Factor = 100 (intentionally set high to allow user to exceed
queue capacity when idle capacity exists)
* max cap = 100%
* max AM res limit = 20%
* inter / intra queue preemption: Enabled
* ordering policy = Fair
Spark config
* spark.executor.cores=1
* spark.executor.memory=5G
* spark.driver.memory=4G
* spark.driver.maxResultSize=2G
* spark.executor.memoryOverhead=1024
* spark.dynamicAllocation.enabled = true
> Inter Queue preemption is not happening when DRF is configured
> --------------------------------------------------------------
>
> Key: YARN-6538
> URL: https://issues.apache.org/jira/browse/YARN-6538
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler, scheduler preemption
> Affects Versions: 2.8.0
> Reporter: Sunil G
> Assignee: Sunil G
> Priority: Major
>
> Cluster capacity of <memory:3TB, vCores:168>. Here memory is more and vcores
> are less. If applications have more demand, vcores might be exhausted.
> Inter queue preemption ideally has to be kicked in once vcores is over
> utilized. However preemption is not happening.
> Analysis:
> In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}},
> {code}
> // assign all cluster resources until no more demand, or no resources are
> // left
> while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
> unassigned, Resources.none())) {
> {code}
> will loop even when vcores are 0 (because memory is still +ve). Hence we are
> having more vcores in idealAssigned which cause no-preemption cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]