On Fri, 23 Apr 2021 13:33:42 GMT, Hao Tang 
<github.com+7947546+tanghaot...@openjdk.org> wrote:

> OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though 
> the CPU load is obviously below 100%.
> 
> We created a 5-core container and run 4 "while (true)" loops in the 
> container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect 
> (0.8 is correct).
> "systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = 
> Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time 
> (specified by "cpu.cfs_period_us") instead of the total CPU time (specified 
> by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage 
> time by "quotaNanos" instead of "elapsedNanos".

src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java
 line 142:

> 140:                 long usageNanos = containerMetrics.getCpuUsage();
> 141:                 if (numPeriods > 0 && usageNanos > 0) {
> 142:                     long quotaNanos = 
> TimeUnit.MICROSECONDS.toNanos(quota * numPeriods);

We happened to hit an exactly similar problem when running on a container with 
openjdk15.

Given we effectively agree that the problem is `elapsedNanos` doesn't 
accurately reflect the cpu time allocated across all shares vs a single share, 
my proposal was to use `getCpuShares` as a multiplier for `periodLength` above. 
Is there a good reason `getCpuQuota` is a better alternative?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3656

Reply via email to