Tamas Domok created YARN-11507:
----------------------------------

             Summary: CSQueue properties are affected by 
DominantResourceCalculator in a non-intuitive way
                 Key: YARN-11507
                 URL: https://issues.apache.org/jira/browse/YARN-11507
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 3.4.0
            Reporter: Tamas Domok
            Assignee: Tamas Domok


The following queue hierarchy CapacityScheduler have different 
capacity/absoluteCapacity for its queues, based on which resource calculator is 
used (Default or Dominant).

{code}
    conf.put("yarn.scheduler.capacity.resource-calculator", 
"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator");
    conf.put("yarn.scheduler.capacity.legacy-queue-mode.enabled", "true");
    conf.put("yarn.scheduler.capacity.root.queues", "a, b");
    conf.put("yarn.scheduler.capacity.root.a.capacity", 
"[memory=4096,vcores=8]");
    conf.put("yarn.scheduler.capacity.root.b.capacity", 
"[memory=12288,vcores=8]");
    conf.put("yarn.scheduler.capacity.root.b.queues", "b1, b2");
    conf.put("yarn.scheduler.capacity.root.b.b1.capacity", 
"[memory=3072,vcores=6]");
    conf.put("yarn.scheduler.capacity.root.b.b2.capacity", 
"[memory=9216,vcores=2]");
{code}

{code}
                    DefaultResourceCalculator               
DominantResourceCalculator
                    capacity absoluteCapacity maxApps       capacity 
absoluteCapacity maxApps
   root.a           0.25     0.25             2500          0.5      0.5        
      5000
   root.b           0.75     0.75                           0.75     0.75
   root.b.b1        0.25     0.1875           1875          0.5      0.375      
      3750
   root.b.b2        0.75     0.5625           5625          0.75     0.5625     
      5625
{code}

Issues: using DominantResourceCalculator, the capacity/absoluteCapacity for the 
first (even second) level of queues is greater than 100%. There are properties 
(like maxApplications) that are calculated from the absoluteCapacity (e.g.: the 
sum of max apps is 10000 using the DefaultRC but 14375 using the DominantRC).

I don't see any reason why the ResourceCalculator abstraction should affect the 
capacity/absoluteCapacity or any property of the queue queues in the hierarchy. 
The cluster resource should be shared amongst the queues based their 
configuration on the individual resource types. The effectiveMin/Max resource 
should be a calculated for each queue for each resource type and that should be 
the source of truth for the available resources for the queues, and later that 
should be used for calculations. The absoluteCapacity should be calculated from 
only one resource type (e.g.: memory) or it should be normalised someway.

The DominantResourceCalculator is useful when the whole cluster is utilised by 
apps with multiple users (see this research: 
https://cs.stanford.edu/~matei/papers/2011/nsdi_drf.pdf), but the queues are 
not competing with each other with different dominant resources. The cluster 
resource should be just shared based on the queue configurations.

Added a test case for reproduction to my 
[fork|https://github.com/tomicooler/hadoop/commit/1d673350c18e97abea2703fa45dc6ff9f91aca8f].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to