On Tue, 11 May 2021 15:48:43 GMT, Hao Tang 
<github.com+7947546+tanghaot...@openjdk.org> wrote:

>>> Thanks for linking that. It sounds reasonable to me to prefer `quota` in 
>>> that case.
>> 
>> Yes, flag `PreferContainerQuotaForCPUCount` is [true by 
>> default](https://github.com/openjdk/jdk/blob/739769c8/src/hotspot/os/linux/globals_linux.hpp#L62).
>>  Therefore, [my current 
>> implementation](https://github.com/openjdk/jdk/pull/3656/commits/b052b624c84)
>>  might be a minimal implementation.
>> 
>> We can also cover the case where `PreferContainerQuotaForCPUCount` is false. 
>> There are two different ways.
>> 1. To access the value of `PreferContainerQuotaForCPUCount`, so that we can 
>> decide if we should use `quota` or (`quota` &  `share`);
>> 2. Reuse `CgroupSubsystem::active_processor_count`. However, the function 
>> returns an integer. It is more reasonable to use a floating number.
>> 
>> Looking forward to your suggestion.
>
>> @tanghaoth90 My local testing suggests that your fix addresses the issue of 
>> CPU quotas set via `--cpu-quota/--cpu-period`. When using `--cpu-shares` the 
>> CPU load calculation is wrong since it will (wrongly) report host values. 
>> Lets look at them individually, fix the quota and shares case individually 
>> (i.e. when not both are set). Once that's done, quota will be preferred in 
>> the OperatingSystemMXBean impl, which is reasonable. I don't think we need 
>> to account for the shares-preferred-over-quota at this point since that 
>> changed in HotSpot in JDK 11 time-frame (JDK-8197589) and 
>> OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it 
>> got backported, but still).
> 
> @jerboaa I have updated my fix by inserting a new branch for `--cpu-shares`. 
> By setting up containers with different restrictions based on your 
> suggestion, I noticed two problems for the CPU load calculation. Take 
> [TestLoad.java](https://bugs.openjdk.java.net/secure/attachment/94530/TestLoad.java)
>  as the example.
> 
> 1. `--cpu-quota=700000 --cpu-period=100000` as the restriction: the result is 
> getting close to `6/7` slowly as time goes by, which indicates that the 
> result of 
> 
> long usageNanos = containerMetrics.getCpuUsage();
> 
> is an accumulated CPU usage rather than a real-time CPU usage.
> According to the javadoc, `getCpuLoad()` _returns the "recent cpu usage"_. 
> The current fix obviously does not address this issue.
> 
> 2. `--cpu-shares=2048` as the restriction: the "cpu-share" branch only 
> returns `-1` since `containerMetrics.getCpuNumPeriods()` returns `0` 
> (`nr_periods` of `cpu.stat` stays `0` when only `--cpu-shares` is set). By 
> now, I have no idea to compute the elapsed time or the total available CPU 
> time with the help of the methods of `jdk.internal.platform.CgroupMetrics`.
> 
> Any suggestion is appreciated. @jerboaa @argha-c

@tanghaoth90

As to your observation of 1). Yes, that's true and I'm not sure it's worth 
changing. We could change the javadoc.

As for 2) I've created a PR for your branch with a suggested implementation for 
cpu shares based accounting. It uses a similar heuristics than the 
getCpuLoad0() uses here:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L278

That's the best I could think of. It's certainly more accurate than the current 
implementation.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3656

Reply via email to