On Wed, 27 Nov 2024 09:11:22 GMT, Sergey Chernyshev <schernys...@openjdk.org> wrote:
>> Cgroup V1 subsustem fails to initialize mounted controllers properly in >> certain cases, that may lead to controllers left undetected/inactive. We >> observed the behavior in CloudFoundry deployments, it affects also host >> systems. >> >> The relevant /proc/self/mountinfo line is >> >> >> 2207 2196 0:43 >> /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c >> /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - >> cgroup cgroup rw,cpu,cpuacct >> >> >> /proc/self/cgroup: >> >> >> 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c >> >> >> Here, Java runs inside containerized process that is being moved cgroups due >> to load balancing. >> >> Let's examine the condition at line 64 here >> https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72 >> It is always FALSE and the branch is never taken. The issue was spotted >> earlier by @jerboaa in >> [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019). >> >> The original logic was intended to find the common prefix of `_root`and >> `cgroup_path` and concatenate the remaining suffix to the `_mount_point` >> (lines 67-68). That could lead to the following results: >> >> Example input >> >> _root = "/a" >> cgroup_path = "/a/b" >> _mount_point = "/sys/fs/cgroup/cpu,cpuacct" >> >> >> result _path >> >> "/sys/fs/cgroup/cpu,cpuacct/b" >> >> >> Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page >> (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control >> groups states: >> >> >> ... >> /proc/pid/cgroup (since Linux 2.6.24) >> This file describes control groups to which the process >> with the corresponding PID belongs. The displayed >> information differs for cgroups version 1 and version 2 >> hierarchies. >> For each cgroup hierarchy of which the process is a >> member, there is one entry containing three colon- >> separated fields: >> >> hierarchy-ID:controller-list:cgroup-path >> >> For example: >> >> 5:cpuacct,cpu,cpuset:/daemons >> ... >> [3] This field contains the pathname of the control group >> in the hierarchy to which the process belongs. This >> pathname is relative to the mount point of the >> hierarchy. >> >> >> This explicitly states the "pathname is relative t... > > Sergey Chernyshev has updated the pull request incrementally with one > additional commit since the last revision: > > adjust path suffix in cgroup (v1) version specific code, when root != cgroup src/hotspot/os/linux/cgroupUtil_linux.cpp line 70: > 68: os::free(limit_cg_path); // handles nullptr > 69: limit_cg_path = os::strdup(cg_path); > 70: } We can avoid the duplicate copy of the original cgroup path, which is already captured in `orig` by using: jlong lowest_limit = limit < 0 ? phys_mem : limit; julong orig_limit = ((julong)lowest_limit) != phys_mem ? lowest_limit : phys_mem; And on line 91 we change the condition from: if ((julong)lowest_limit != phys_mem) { to: if ((julong)lowest_limit != orig_limit) { src/hotspot/os/linux/cgroupUtil_linux.cpp line 129: > 127: lowest_limit = cpus; > 128: os::free(limit_cg_path); // handles nullptr > 129: limit_cg_path = os::strdup(cg_path); Same here with the extra allocation of `cg_path`; src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 320: > 318: ss.print_raw(cgroup_path); > 319: if (strstr((char*)cgroup_path, "../") != nullptr) { > 320: log_warning(os, container)("Cgroup v2 path at [%s] is [%s], cgroup > limits can be wrong.", Suggestion: log_warning(os, container)("Cgroup cpu/memory controller path includes '../', detected limits won't be accurate", src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 322: > 320: log_warning(os, container)("Cgroup v2 path at [%s] is [%s], cgroup > limits can be wrong.", > 321: mount_path, cgroup_path); > 322: } Why the cast to `char*`? We should probably move this warning to `CgroupUtil::adjust_controller`, right before we've determined that we actually need to adjust. I wonder, though, if we should just print the warning and set the cgroup_path to `/` and return early. Otherwise, path adjustment will run with no different result. src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java line 66: > 64: // Rely on path adjustment that determines the > actual suffix. > 65: path += cgroupPath; > 66: } This seems a simpler solution than the hotspot one. While I prefer this one, please make them consistent at the least. test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 39: > 37: * @modules java.base/jdk.internal.platform > 38: * @library /test/lib > 39: * @build jdk.test.whitebox.WhiteBox CheckOperatingSystemMXBean `CheckOperatingSystemMXBean` seems unused. test/jdk/jdk/internal/platform/cgroup/CgroupV1SubsystemControllerTest.java line 93: > 91: > 92: @Test > 93: public void testCgPathFallbackToMountPoint() { Suggestion: public void testCgPathToMovedPath() { test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java line 479: > 477: cgroupv1MemoryController.setPath(cpuInfo.getCgroupPath()); > 478: String actualPath = cgroupv1MemoryController.path(); > 479: assertNotNull("Controller path should not have been null", > actualPath); Suggestion: assertNotNull(actualPath); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860392728 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860401209 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860438020 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860430948 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860449131 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860453472 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860455992 PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860458236