[ 
https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309038#comment-17309038
 ] 

Siddharth Ahuja edited comment on YARN-10705 at 3/25/21, 11:18 PM:
-------------------------------------------------------------------

Added a patch to ensure that logging only happens in case of an actual 
container assignment/allocation, not reservation.

Tested this on a single node cluster from generated distribution after 
compilation of the patch on trunk using the below steps:

* Set {{yarn.resourcemanager.scheduler.class}} to 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}},
* Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run 
containers,
* Enabled DEBUG logging for the FSLeafQueue class to check for debug logs:
{code}
bin/yarn daemonlog -setlevel localhost:8088 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG 
{code}
* Check for the DEBUG allocation message in the RM logs :
{code}
tail -f rmlogs.log | grep "DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.somequeue"
{code}
* Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB 
are used up:
{code}
bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar
 sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 
-m 1 -mt 600000
{code}
* Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 
2nd application starts i.e. AM starts but there is no room for the 4GB 
container yet so reservation for the 4GB non-AM happens.
 {code}
bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar
 sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 
-m 1 -mt 600000
{code}
* With the patch only following 3 lines are present when reservation occurs 
which is expected after the patch is applied:
{code}
2021-03-25 17:54:13,475 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>
2021-03-25 17:54:20,507 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>


2021-03-25 17:54:35,558 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>
{code}
however, in the case of no patch, this was getting added before:
{code}
2021-03-25 17:54:43,589 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:-1, vCores:0>
{code}

No JUnits required, as the change is about a "lack" of log, no change to 
functionality, as such, re-running existing JUnits should suffice. 




was (Author: sahuja):
Added a patch to ensure that logging only happens in case of an actual 
container assignment/allocation, not reservation.

Tested this on a single node cluster from generated distribution after 
compilation of the patch on trunk using the below steps:

* Set {{yarn.resourcemanager.scheduler.class}} to 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}},
* Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run 
containers,
* Enabled DEBUG logging for the FSLeafQueue class to check for debug logs:
{code}
bin/yarn daemonlog -setlevel localhost:8088 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG 
{code}
* Check for the DEBUG allocation message in the RM logs :
{code}
tail -f rmlogs.log | grep "DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.somequeue"
{code}
* Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB 
are used up:
{code}
bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar
 sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 
-m 1 -mt 600000
{code}
* Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 
2nd application starts i.e. AM starts but there is no room for the 4GB 
container yet so reservation for the 4GB non-AM happens.
 {code}
bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar
 sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 
-m 1 -mt 600000
{code}
* With the patch only following 3 lines are pretty when reservation occurs:
{code}
2021-03-25 17:54:13,475 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>
2021-03-25 17:54:20,507 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>


2021-03-25 17:54:35,558 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1>
{code}
however, in the case of no patch, this was getting added before:
{code}
2021-03-25 17:54:43,589 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
Assigned container in queue:root.sidtheadmin container:<memory:-1, vCores:0>
{code}

No JUnits required, as the change is about a "lack" of log, no change to 
functionality, as such, re-running existing JUnits should suffice. 



> Misleading DEBUG log for container assignment needs to be removed when the 
> container is actually reserved, not assigned in FairScheduler
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10705
>                 URL: https://issues.apache.org/jira/browse/YARN-10705
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.4.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>            Priority: Minor
>         Attachments: YARN-10705.001.patch
>
>
> Following DEBUG logs are logged if a container reservation is made when a 
> node has been offered to the queue in FairScheduler:
> {code}
> 2021-02-10 07:33:55,049 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> application_1610442362681_2607's resource request is reserved.
> 2021-02-10 07:33:55,049 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> Assigned container in queue:root.pj_dc_pe container:<memory:-1, vCores:0>
> {code}
> The latter log from above seems to indicate a bad container assignment with 
> <memory:-1, vCores:0> resource allocation, whereas, in actual, it is a bad 
> log which shouldn't have been logged in the first place.
> This log comes from [1] after an application attempt with an unmet demand is 
> checked for container assignment/reservation.
> If the container for this app attempt is reserved on the node, then, it 
> returns <memory:-1, vCores:0> from [2].
> From [3]:
> {quote}
>    *     If an assignment was made, returns the resources allocated to the
>    *     container.  If a reservation was made, returns
>    *     FairScheduler.CONTAINER_RESERVED.  If no assignment or reservation 
> was
>    *     made, returns an empty resource.
> {quote}
> We are checking for the empty resource at [4], but not 
> FairScheduler.CONTAINER_RESERVED before logging out a message for container 
> assignment specifically which is incorrect.
> Instead of:
> {code}
>       if (!assigned.equals(none())) {
>         LOG.debug("Assigned container in queue:{} container:{}",
>             getName(), assigned);
>         break;
>       }
> {code}
> it should be:
> {code}
>       // check if an assignment or a reservation was made.
>       if (!assigned.equals(none())) {
>         // only log container assignment if there is
>         // an actual assignment, not a reservation.
>         if (!assigned.equals(FairScheduler.CONTAINER_RESERVED)
>                 && LOG.isDebugEnabled()) {
>           LOG.debug("Assigned container in queue:" + getName() + " " +
>                         "container:" + assigned);
>         }
>         break;
>       }
> {code}
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356
> [2] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911
> [3] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842
> [4] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to