[
https://issues.apache.org/jira/browse/YARN-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949980#comment-16949980
]
Xianghao Lu edited comment on YARN-9896 at 10/12/19 9:36 AM:
-------------------------------------------------------------
I have print log in my test cluster with patch as follows
{code:java}
diff --git
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
index 18b4ba5..101ef0c 100644
---
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
+++
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
@@ -65,6 +65,10 @@
: 4 * scheduler.getNMHeartbeatInterval()); // 4 heartbeats
delayBeforeNextStarvationCheck = warnTimeBeforeKill + allocDelay +
fsConf.getWaitTimeBeforeNextStarvationCheck();
+ LOG.info("nm hearbeat " + fsConf.getNMHeartbeatInterval() + " allocDelay "
+ allocDelay);
+ LOG.info("warnTimeBeforeKill " + warnTimeBeforeKill
+ + " WaitTimeBeforeNextStarvationCheck " +
fsConf.getWaitTimeBeforeNextStarvationCheck()
+ + " delayBeforeNextStarvationCheck " +
delayBeforeNextStarvationCheck);
schedulerReadLock = scheduler.getSchedulerReadLock();
}
{code}
log before fix
{code:java}
2019-10-12 15:35:04,621 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
nm hearbeat 0 allocDelay 0
2019-10-12 15:35:04,621 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
warnTimeBeforeKill 15000 WaitTimeBeforeNextStarvationCheck 10000
delayBeforeNextStarvationCheck 25000
{code}
log after fix
{code:java}
2019-10-12 16:29:20,725 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
nm hearbeat 1000 allocDelay 4000
2019-10-12 16:29:20,725 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
warnTimeBeforeKill 15000 WaitTimeBeforeNextStarvationCheck 10000
delayBeforeNextStarvationCheck 29000
{code}
was (Author: luxianghao):
I have print log in my test cluster with patch as follows
{code:java}
diff --git
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
index 18b4ba5..101ef0c 100644
---
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
+++
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
@@ -65,6 +65,10 @@
: 4 * scheduler.getNMHeartbeatInterval()); // 4 heartbeats
delayBeforeNextStarvationCheck = warnTimeBeforeKill + allocDelay +
fsConf.getWaitTimeBeforeNextStarvationCheck();
+ LOG.info("nm hearbeat " + fsConf.getNMHeartbeatInterval() + " allocDelay "
+ allocDelay);
+ LOG.info("warnTimeBeforeKill " + warnTimeBeforeKill
+ + " WaitTimeBeforeNextStarvationCheck " +
fsConf.getWaitTimeBeforeNextStarvationCheck()
+ + " delayBeforeNextStarvationCheck " +
delayBeforeNextStarvationCheck);
schedulerReadLock = scheduler.getSchedulerReadLock();
}
{code}
log before fix
{code:java}
2019-10-12 15:35:04,621 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
{color:red}nm hearbeat 0 allocDelay 0{color}
2019-10-12 15:35:04,621 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
warnTimeBeforeKill 15000 WaitTimeBeforeNextStarvationCheck 10000
delayBeforeNextStarvationCheck {color:red}25000{color}
{code}
log after fix
{code:java}
2019-10-12 16:29:20,725 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
{color:red}nm hearbeat 1000 allocDelay 4000{color}
2019-10-12 16:29:20,725 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread:
warnTimeBeforeKill 15000 WaitTimeBeforeNextStarvationCheck 10000
delayBeforeNextStarvationCheck {color:red}29000{color}
{code}
> allocDelay in FSPreemptionThread should not be 0
> ------------------------------------------------
>
> Key: YARN-9896
> URL: https://issues.apache.org/jira/browse/YARN-9896
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.9.1
> Reporter: Xianghao Lu
> Assignee: Xianghao Lu
> Priority: Major
> Fix For: 2.9.1
>
> Attachments: YARN-9896.001.patch
>
>
> [allocDelay|https://github.com/apache/hadoop/blob/e30710aea4e6e55e69372929106cf119af06fd0e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java#L63]
> in FSPreemptionThread.java will be 0, because
> [nmHeartbeatInterval|https://github.com/apache/hadoop/blob/e30710aea4e6e55e69372929106cf119af06fd0e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L147]
> in AbstractYarnScheduler.java has not been initialized.
> Initialization order
> [here|https://github.com/apache/hadoop/blob/e30710aea4e6e55e69372929106cf119af06fd0e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1379]
> make this bug happen.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]