Inigo Goiri commented on YARN-3481:

I started implementing this and I hit a philosophical issue: how to report the 

Initially, I thought about using the Resource class (with the ResourceProto for 
transfer) but it has an issue with accuracy. If we have one container using 10% 
of a CPU, Resource only allows to specify 0 or 1 for the VCores.

There are multiple possible approaches to solve this issue:
*Modify Resource to define VCores as a double. The problem with this is that 
we'd need to change many interfaces.
*Modify Resource to store milliVCores internally and keep all the public 
interfaces with VCores.
*Create a new type called ResourceUtilization that would have a float instead 
of an int. We would use this new type to send utilization data. This new class 
would also be suitable to send other utilizations like disk queue length, etc.
*Keep using Resource as is but when working with utilization, put milliVCores 
there. In this case, we would have a weird semantics for Resource where 
sometimes we send milliVCores and other times we send VCores.
*Define 1 VCore as 0.001 CPUs in the cluster. The problem with this is that 
applications would have to change how many VCores they ask for.

Note that YARN-3122 is storing a metric called milliVCores for this.

I would like to see what people thinks is the best option. Ideas?

> Report NM aggregated container resource utilization in heartbeat
> ----------------------------------------------------------------
>                 Key: YARN-3481
>                 URL: https://issues.apache.org/jira/browse/YARN-3481
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Inigo Goiri
>            Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.

This message was sent by Atlassian JIRA

Reply via email to