On Tue, 11 May 2021 15:48:43 GMT, Hao Tang <github.com+7947546+tanghaot...@openjdk.org> wrote:
>>> Thanks for linking that. It sounds reasonable to me to prefer `quota` in >>> that case. >> >> Yes, flag `PreferContainerQuotaForCPUCount` is [true by >> default](https://github.com/openjdk/jdk/blob/739769c8/src/hotspot/os/linux/globals_linux.hpp#L62). >> Therefore, [my current >> implementation](https://github.com/openjdk/jdk/pull/3656/commits/b052b624c84) >> might be a minimal implementation. >> >> We can also cover the case where `PreferContainerQuotaForCPUCount` is false. >> There are two different ways. >> 1. To access the value of `PreferContainerQuotaForCPUCount`, so that we can >> decide if we should use `quota` or (`quota` & `share`); >> 2. Reuse `CgroupSubsystem::active_processor_count`. However, the function >> returns an integer. It is more reasonable to use a floating number. >> >> Looking forward to your suggestion. > >> @tanghaoth90 My local testing suggests that your fix addresses the issue of >> CPU quotas set via `--cpu-quota/--cpu-period`. When using `--cpu-shares` the >> CPU load calculation is wrong since it will (wrongly) report host values. >> Lets look at them individually, fix the quota and shares case individually >> (i.e. when not both are set). Once that's done, quota will be preferred in >> the OperatingSystemMXBean impl, which is reasonable. I don't think we need >> to account for the shares-preferred-over-quota at this point since that >> changed in HotSpot in JDK 11 time-frame (JDK-8197589) and >> OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it >> got backported, but still). > > @jerboaa I have updated my fix by inserting a new branch for `--cpu-shares`. > By setting up containers with different restrictions based on your > suggestion, I noticed two problems for the CPU load calculation. Take > [TestLoad.java](https://bugs.openjdk.java.net/secure/attachment/94530/TestLoad.java) > as the example. > > 1. `--cpu-quota=700000 --cpu-period=100000` as the restriction: the result is > getting close to `6/7` slowly as time goes by, which indicates that the > result of > > long usageNanos = containerMetrics.getCpuUsage(); > > is an accumulated CPU usage rather than a real-time CPU usage. > According to the javadoc, `getCpuLoad()` _returns the "recent cpu usage"_. > The current fix obviously does not address this issue. > > 2. `--cpu-shares=2048` as the restriction: the "cpu-share" branch only > returns `-1` since `containerMetrics.getCpuNumPeriods()` returns `0` > (`nr_periods` of `cpu.stat` stays `0` when only `--cpu-shares` is set). By > now, I have no idea to compute the elapsed time or the total available CPU > time with the help of the methods of `jdk.internal.platform.CgroupMetrics`. > > Any suggestion is appreciated. @jerboaa @argha-c @tanghaoth90 As to your observation of 1). Yes, that's true and I'm not sure it's worth changing. We could change the javadoc. As for 2) I've created a PR for your branch with a suggested implementation for cpu shares based accounting. It uses a similar heuristics than the getCpuLoad0() uses here: https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L278 That's the best I could think of. It's certainly more accurate than the current implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/3656