On Thu, 6 May 2021 02:34:01 GMT, Hao Tang <github.com+7947546+tanghaot...@openjdk.org> wrote:
>> Thanks for linking that. It sounds reasonable to me to prefer `quota` in >> that case. > >> Thanks for linking that. It sounds reasonable to me to prefer `quota` in >> that case. > > Yes, flag `PreferContainerQuotaForCPUCount` is [true by > default](https://github.com/openjdk/jdk/blob/739769c8/src/hotspot/os/linux/globals_linux.hpp#L62). > Therefore, [my current > implementation](https://github.com/openjdk/jdk/pull/3656/commits/b052b624c84) > might be a minimal implementation. > > We can also cover the case where `PreferContainerQuotaForCPUCount` is false. > There are two different ways. > 1. To access the value of `PreferContainerQuotaForCPUCount`, so that we can > decide if we should use `quota` or (`quota` & `share`); > 2. Reuse `CgroupSubsystem::active_processor_count`. However, the function > returns an integer. It is more reasonable to use a floating number. > > Looking forward to your suggestion. > @tanghaoth90 My local testing suggests that your fix addresses the issue of > CPU quotas set via `--cpu-quota/--cpu-period`. When using `--cpu-shares` the > CPU load calculation is wrong since it will (wrongly) report host values. > Lets look at them individually, fix the quota and shares case individually > (i.e. when not both are set). Once that's done, quota will be preferred in > the OperatingSystemMXBean impl, which is reasonable. I don't think we need to > account for the shares-preferred-over-quota at this point since that changed > in HotSpot in JDK 11 time-frame (JDK-8197589) and OperatingSystemMXBean has > only been made container aware in JDK 14 (yes, it got backported, but still). @jerboaa I have updated my fix by inserting a new branch for `--cpu-shares`. By setting up containers with different restrictions based on your suggestion, I noticed two problems for the CPU load calculation. Take [TestLoad.java](https://bugs.openjdk.java.net/secure/attachment/94530/TestLoad.java) as the example. 1. `--cpu-quota=700000 --cpu-period=100000` as the restriction: the result is getting close to `6/7` slowly as time goes by, which indicates that the result of long usageNanos = containerMetrics.getCpuUsage(); is an accumulated CPU usage rather than a real-time CPU usage. According to the javadoc, `getCpuLoad()` _returns the "recent cpu usage"_. The current fix obviously does not address this issue. 2. `--cpu-shares=2048` as the restriction: the "cpu-share" branch only returns `-1` since `containerMetrics.getCpuNumPeriods()` returns `0` (`nr_periods` of `cpu.stat` stays `0` when only `--cpu-shares` is set). By now, I have no idea to compute the elapsed time or the total available CPU time with the help of the methods of `jdk.internal.platform.CgroupMetrics`. Any suggestion is appreciated. @jerboaa @argha-c ------------- PR: https://git.openjdk.java.net/jdk/pull/3656