[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562622#comment-14562622
 ] 

Rohith commented on YARN-3733:
------------------------------

Verified the RM logs from [~bibinchundatt] offline. The sequence of events 
ocured are 
# 30 applications are submitted to RM1 concurrently. *pendingApplications=18 
and activeApplications=12*. Active applications are started RUNNING state.
# RM1 switched to standby, RM2 transitioned to Active state. Currently active 
RM is RM2.
# Previous submitted 30 applications started recovering. As part of recovery 
process, all the 30 applications submitted to schedulers and all these 
applications become active i.e *activeApplications=30 and 
pendingApplications=0* which is not expected to happen.
# NM registered with RM and running AM's registered with RM.
# Since 30 applications are activated, schedulers tries to launch all the 
activated applications ApplicatonMater and occupied full cluster capacity.

Basically the issue AM limit check in LeafQueue#activateApplications is not 
working as expected for {{DominantResourceAllocator}}. In order to confirm 
this, written simple program for both Default and Dominant resource allocator 
like below memory configurations. Output of the program is 
For DefaultResourceAllocator, result is false which Limits the applications 
being activated when AM resource Limit is exceeded.
For DominatReosurceAllocator, result is true  which allows all the applications 
to be activated even if AM resource Limit is exceeded.
{noformat}
2015-05-28 14:00:52,704 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
application AMResource <memory:4096, vCores:1> maxAMResourcePerQueuePercent 0.5 
amLimit <memory:0, vCores:0> lastClusterResource <memory:0, vCores:0> 
amIfStarted <memory:4096, vCores:1>
{noformat}

{code}
package com.test.hadoop;

import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator;
import org.apache.hadoop.yarn.util.resource.DominantResourceCalculator;
import org.apache.hadoop.yarn.util.resource.ResourceCalculator;
import org.apache.hadoop.yarn.util.resource.Resources;

public class TestResourceCalculator {

  public static void main(String[] args) {
    // Default Resource Allocator
    ResourceCalculator defaultResourceCalculator =
        new DefaultResourceCalculator();

    // Dominant Resource Allocator
    ResourceCalculator dominantResourceCalculator =
        new DominantResourceCalculator();

    Resource lastClusterResource = Resource.newInstance(0, 0);
    Resource amIfStarted = Resource.newInstance(4096, 1);
    Resource amLimit = Resource.newInstance(0, 0);

   // expected result false, but actual also false
    System.out.println("DefaultResourceCalculator : "
        + Resources.lessThanOrEqual(defaultResourceCalculator,
            lastClusterResource, amIfStarted, amLimit));

   // expected result false, but actual also true for DominantResourceAllocator
    System.out.println("DominantResourceCalculator : "
        + Resources.lessThanOrEqual(dominantResourceCalculator,
            lastClusterResource, amIfStarted, amLimit));

  }
}

{code}

>  On RM restart AM getting more than maximum possible memory when many  tasks 
> in queue
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-3733
>                 URL: https://issues.apache.org/jira/browse/YARN-3733
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>            Reporter: Bibin A Chundatt
>            Assignee: Rohith
>            Priority: Critical
>
> Steps to reproduce
> =================
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =====
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> =======
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to