[ 
https://issues.apache.org/jira/browse/YARN-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806611#comment-15806611
 ] 

Miklos Szegedi commented on YARN-5936:
--------------------------------------

In the latest test I used 100 threads per program, I just did not share the 
code. They run in parallel, so the sum of time command results measures, 
whether the whole set spent time in additional CPU cycles other than the 
activity loop. The reason I checked, is to ask whether you like a solution that 
uses {{cpu.cfs_quota_us}}.
I could imagine a dynamic cfs algorithm like the following.
A timer callback with a certain period could do:
{code}
if CPU is saturated
  for each container
    if previous usage > fair share
      limit to fair share
else
  release all limits
{code}
It has drawbacks. It only works with saturated CPU, when not much time is spent 
waiting on I/O. It has a delay, since it works on historic data. This means 
also that it adds some utilization loss, which can be larger with multiple 
cores. On the other hand, it provides the requested fairness, when the CPU is 
saturated.
Does your node have multiple cores? The algorithm may not help much in that 
case. For example there are 8 cores. One container runs 8 threads, one 
container runs 2 threads. The fair share requested is 50%-50%. Without 
throttling the two containers will share 80%-20%. Even, if we set the fair 
share by throttling, when the cores are saturated, the usage will be 50%/25% 
when the quota is applied, so there is a utilization loss for a period. Now 
then, the algorithm may get more complicated...

> when cpu strict mode is closed, yarn couldn't assure scheduling fairness 
> between containers
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-5936
>                 URL: https://issues.apache.org/jira/browse/YARN-5936
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>         Environment: CentOS7.1
>            Reporter: zhengchenyu
>            Priority: Critical
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> When using LinuxContainer, the setting that 
> "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" is 
> true could assure scheduling fairness with the cpu bandwith of cgroup. But 
> the cpu bandwidth of cgroup would lead to bad performance in our experience. 
>     Without cpu bandwidth of cgroup, cpu.share of cgroup is our only way to 
> assure scheduling fairness, but it is not completely effective. For example, 
> There are two container that have same vcore(means same cpu.share), one 
> container is single-threaded, the other container is multi-thread. the 
> multi-thread will have more CPU time, It's unreasonable!
>     Here is my test case, I submit two distributedshell application. And two 
> commmand are below:
> {code}
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 10 -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 1  -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> {code}
>      here show the cpu time of the two container:
> {code}
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 15448 yarn      20   0 9059592  28336   9180 S 998.7  0.1  24:09.30 java
> 15026 yarn      20   0 9050340  27480   9188 S 100.0  0.1   3:33.97 java
> 13767 yarn      20   0 1799816 381208  18528 S   4.6  1.2   0:30.55 java
>    77 root      rt   0       0      0      0 S   0.3  0.0   0:00.74 
> migration/1   
> {code}
>     We find the cpu time of Muliti-Thread are ten times than the cpu time of 
> Single-Thread, though the two container have same cpu.share.
> notes:
> run.sh
> {code} 
>       java -cp /home/yarn/loop.jar:$CLASSPATH loop.loop $1    
> {code} 
> loop.java
> {code} 
> package loop;
> public class loop {
>       public static void main(String[] args) {
>               // TODO Auto-generated method stub
>               int loop = 1;
>               if(args.length>=1) {
>                       System.out.println(args[0]);
>                       loop = Integer.parseInt(args[0]);
>               }
>               for(int i=0;i<loop;i++){
>                       System.out.println("start thread " + i);
>                       new Thread(new Runnable() {
>                               @Override
>                               public void run() {
>                                       // TODO Auto-generated method stub
>                                       int j=0;
>                                       while(true){j++;}
>                               }
>                       }).start();
>               }
>       }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to