gu-chi created YARN-4481:
----------------------------
Summary: negative pending resource of queues lead to applications
in accepted status inifnitly
Key: YARN-4481
URL: https://issues.apache.org/jira/browse/YARN-4481
Project: Hadoop YARN
Issue Type: Bug
Components: capacity scheduler
Affects Versions: 2.7.2
Reporter: gu-chi
Priority: Critical
Met a scenario of negative pending resource with capacity scheduler, in jmx, it
shows:
{noformat}
"PendingMB" : -4096,
"PendingVCores" : -1,
"PendingContainers" : -1,
{noformat}
full jmx infomation attached.
this is not just a jmx UI issue, the actual pending resource of queue is also
negative as I see the debug log of
bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because it
doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY
node-partition= | ParentQueue.java
this lead to the {{NULL_ASSIGNMENT}}
The background is submitting hundreds of applications and consume all cluster
resource and reservation happen. While running, network fault injected by some
tool, injection types are delay,jitter
,repeat,packet loss and disorder. And then kill most of the applications
submitted.
Anyone also facing negative pending resource, or have idea of how this happen?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)