On Tue, 4 May 2021 04:07:52 GMT, Argha C 
<github.com+971473+argh...@openjdk.org> wrote:

>> OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though 
>> the CPU load is obviously below 100%.
>> 
>> We created a 5-core container and run 4 "while (true)" loops in the 
>> container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect 
>> (0.8 is correct).
>> "systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = 
>> Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time 
>> (specified by "cpu.cfs_period_us") instead of the total CPU time (specified 
>> by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage 
>> time by "quotaNanos" instead of "elapsedNanos".
>
> src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java
>  line 142:
> 
>> 140:                 long usageNanos = containerMetrics.getCpuUsage();
>> 141:                 if (numPeriods > 0 && usageNanos > 0) {
>> 142:                     long quotaNanos = 
>> TimeUnit.MICROSECONDS.toNanos(quota * numPeriods);
> 
> We happened to hit an exactly similar problem when running on a container 
> with openjdk15.
> 
> Given we effectively agree that the problem is `elapsedNanos` doesn't 
> accurately reflect the cpu time allocated across all shares vs a single 
> share, my proposal was to use `getCpuShares` as a multiplier for 
> `periodLength` above. 
> Is there a good reason `getCpuQuota` is a better alternative?

Hi Argha, thanks a lot for your suggestion. I think both "quota" and "share" 
are worth considering. Let us look into the implementation of 
`CgroupSubsystem::active_processor_count()` in OpenJDK HotSpot 
(https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/cgroupSubsystem_linux.cpp).

-------------

PR: https://git.openjdk.java.net/jdk/pull/3656

Reply via email to