Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-23 Thread Severin Gehwolf
On Mon, 23 May 2022 09:24:19 GMT, Severin Gehwolf  wrote:

>> Also, I think the current PR could produce the wrong answer, if systemd is 
>> indeed running inside the container, and we have:
>> 
>> 
>> "/user.slice/user-1000.slice/session-50.scope",// root_path
>> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
>> 
>> 
>> The PR gives /sys/fs/cgroup/memory/user.slice/user-1000.slice/, which 
>> specifies the overall memory limit for user-1000. However, the correct 
>> answer may be 
>> /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope, which may 
>> have a smaller memory limit, and the JVM may end up allocating a larger heap 
>> than allowed.
>
> Yes, if we can decide which one the right file is. This is largely 
> undocumented territory. The correct fix is a) find the correct path to the 
> namespace hierarchy the process is a part of. b) starting at the leaf node, 
> walk up the hierarchy and find the **lowest** limits. Doing this would be 
> very expensive!
> 
> Aside: Current container detection in the JVM/JDK is notoriously imprecise. 
> It's largely based on common setups (containers like docker). The heuristics 
> assume that memory limits are reported inside the container at the leaf node. 
> If, however, that's not the case, the detected limits will be wrong (it will 
> detect it as unlimited, even though it's - for example - memory constrained 
> at the parent). This can for example be reproduced on a cgroups v2 system 
> with a systemd slice using memory limits. We've worked-around this in OpenJDK 
> for cgroups v1 by https://bugs.openjdk.java.net/browse/JDK-8217338

> Maybe we should do this instead?
> 
> * Read /proc/self/cgroup
> 
> * Find the `10:memory:` line
> 
> * If `/sys/fs/cgroup/memory//tasks` contains my PID, this is the 
> path
> 
> * Otherwise, scan all `tasks` files under  `/sys/fs/cgroup/memory/`. 
> Exactly one of them contains my PID.

Something like that seems most promising, but it would have to be 
`cgroup.procs` not `tasks` as `tasks` is the task id (i.e. Linux's thread), not 
the process. We could keep the two common cases as short circuiting. I.e. host 
and docker cases in the test.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-23 Thread Severin Gehwolf
On Thu, 19 May 2022 20:18:50 GMT, Ioi Lam  wrote:

>> I am wondering if the problem is this:
>> 
>> We have systemd running on the host, and a different copy of systemd that 
>> runs inside the container.
>> 
>> - They both set up `/user.slice/user-1000.slice/session-??.scope` within 
>> their own file systems
>> - For some reason, when you're looking inside the container, 
>> `/proc/self/cgroup` might use a path in the containerized file system 
>> whereas `/proc/self/mountinfo` uses a path in the host file system. These 
>> two paths may look alike but they have absolutely no relation to each other.
>> 
>> I have asked the reporter for more information:
>> 
>> https://gist.github.com/gaol/4d96eace8290e6549635fdc0ea41d0b4?permalink_comment_id=4172593#gistcomment-4172593
>> 
>> Meanwhile, I think the current method of finding "which directory under 
>> /sys/fs/cgroup/memory controls my memory usage" is broken. As mentioned 
>> about, the path you get from  `/proc/self/cgroup`  and 
>> `/proc/self/mountinfo` have no relation to each other, but we use them 
>> anyway to get our answer, with many ad-hoc methods that are not documented 
>> in the code.
>> 
>> Maybe we should do this instead?
>> 
>> - Read /proc/self/cgroup
>> - Find the `10:memory:` line
>> - If `/sys/fs/cgroup/memory//tasks` contains my PID, this is the path
>> - Otherwise, scan all `tasks` files under  `/sys/fs/cgroup/memory/`. Exactly 
>> one of them contains my PID.
>> 
>> For example, here's a test with docker:
>> 
>> 
>> INSIDE CONTAINER
>> # cat /proc/self/cgroup | grep memory
>> 10:memory:/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050
>> # cat /proc/self/mountinfo | grep memory
>> 801 791 0:42 
>> /docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050 
>> /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:23 - cgroup 
>> cgroup rw,memory
>> # cat 
>> /sys/fs/cgroup/memory/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050/tasks
>> cat: 
>> /sys/fs/cgroup/memory/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050/tasks:
>>  No such file or directory
>> # cat /sys/fs/cgroup/memory/tasks | grep $$
>> 1
>> 
>> ON HOST
>> # cat 
>> /sys/fs/cgroup/memory/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050/tasks
>> 37494
>> # cat /proc/37494/status | grep NSpid
>> NSpid:   37494   1
>
> Also, I think the current PR could produce the wrong answer, if systemd is 
> indeed running inside the container, and we have:
> 
> 
> "/user.slice/user-1000.slice/session-50.scope",// root_path
> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
> 
> 
> The PR gives /sys/fs/cgroup/memory/user.slice/user-1000.slice/, which 
> specifies the overall memory limit for user-1000. However, the correct answer 
> may be /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope, 
> which may have a smaller memory limit, and the JVM may end up allocating a 
> larger heap than allowed.

Yes, if we can decide which one the right file is. This is largely undocumented 
territory. The correct fix is a) find the correct path to the namespace 
hierarchy the process is a part of. b) starting at the leaf node, walk up the 
hierarchy and find the **lowest** limits. Doing this would be very expensive!

Aside: Current container detection in the JVM/JDK is notoriously imprecise. 
It's largely based on common setups (containers like docker). The heuristics 
assume that memory limits are reported inside the container at the leaf node. 
If, however, that's not the case, the detected limits will be wrong (it will 
detect it as unlimited, even though it's - for example - memory constrained at 
the parent). This can for example be reproduced on a cgroups v2 system with a 
systemd slice using memory limits. We've worked-around this in OpenJDK for 
cgroups v1 by https://bugs.openjdk.java.net/browse/JDK-8217338

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Ioi Lam
On Thu, 19 May 2022 19:59:18 GMT, Ioi Lam  wrote:

>>> What happens if cgroup_path is "/foo/bar" and _root is "/fo"?
>> 
>> With a mount path of `/bar` this ends up being `/bar/o/bar`. Pretty strange, 
>> but then again it's a bit of a contrived example as those paths come from 
>> `/proc` parsing. Anyway, this is code that got added with 
>> [JDK-8146115](https://bugs.openjdk.java.net/browse/JDK-8146115). It's not 
>> something I've written and to be honest, I'm not sure this branch is needed, 
>> but I didn't want to change the existing behaviour with this patch. I have 
>> no more insight than you in terms of why that approach has been taken.
>> 
>>> Maybe this block should be combined with the new `else` block you're adding?
>> 
>> Maybe, but I'm not sure if it would break something.
>> 
>>> However, the behavior seems opposite between these two blocks of code:
>>> 
>>> The upper block: _root is a prefix of cgroup_path, we take the **tail** of 
>>> cgroup_path
>>> 
>>> ```
>>>   TestCase substring_match = {
>>> "/sys/fs/cgroup/memory",   // mount_path
>>> "/user.slice/user-1000.slice", // root_path
>>> "/user.slice/user-1000.slice/user@1001.service",   // cgroup_path
>>> "/sys/fs/cgroup/memory/user@1001.service"  // expected_path
>>>   };
>>> ```
>> 
>> Yes. Though, I cannot comment on why that has been chosen. It's been there 
>> since day one :-/
>> 
>>> The lower block: The beginning of _root is a prefix of cgroup_path, we take 
>>> the **head** of cgroup_path
>>> 
>>> ```
>>>  TestCase prefix_matched_cg = {
>>> "/sys/fs/cgroup/memory",   // mount_path
>>> "/user.slice/user-1000.slice/session-50.scope",// root_path
>>> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
>>> "/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
>>>   };
>>> ```
>>> 
>>> I find the behavior hard to understand. I think the rules (and reasons) 
>>> should be added to the comment block above the function.
>> 
>> The reason why I've gone down the path of adding the head of cgroup_path is 
>> because of this document (in conjunction to what the user was observing on 
>> an affected system):
>> https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
>> 
>> The user was observing paths as listed in the test:
>> 
>> "/user.slice/user-1000.slice/session-50.scope",// root_path
>> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
>> 
>> This very much looks like systemd managed. Given that and knowing that 
>> systemd adds processes into `scopes` or `services` and groups them via 
>> `slices` and also knowing that cgroups are hierarchical (i.e. limits of 
>> `/foo/bar` also apply to `/foo/bar/baz`), it seems likely that if there are 
>> any limits, then it'll be on `/user.slice/user-1000.slice` within the 
>> mounted controller. Unfortunately, I'm not able to reproduce this myself.
>
> I am wondering if the problem is this:
> 
> We have systemd running on the host, and a different copy of systemd that 
> runs inside the container.
> 
> - They both set up `/user.slice/user-1000.slice/session-??.scope` within 
> their own file systems
> - For some reason, when you're looking inside the container, 
> `/proc/self/cgroup` might use a path in the containerized file system whereas 
> `/proc/self/mountinfo` uses a path in the host file system. These two paths 
> may look alike but they have absolutely no relation to each other.
> 
> I have asked the reporter for more information:
> 
> https://gist.github.com/gaol/4d96eace8290e6549635fdc0ea41d0b4?permalink_comment_id=4172593#gistcomment-4172593
> 
> Meanwhile, I think the current method of finding "which directory under 
> /sys/fs/cgroup/memory controls my memory usage" is broken. As mentioned 
> about, the path you get from  `/proc/self/cgroup`  and `/proc/self/mountinfo` 
> have no relation to each other, but we use them anyway to get our answer, 
> with many ad-hoc methods that are not documented in the code.
> 
> Maybe we should do this instead?
> 
> - Read /proc/self/cgroup
> - Find the `10:memory:` line
> - If `/sys/fs/cgroup/memory//tasks` contains my PID, this is the path
> - Otherwise, scan all `tasks` files under  `/sys/fs/cgroup/memory/`. Exactly 
> one of them contains my PID.
> 
> For example, here's a test with docker:
> 
> 
> INSIDE CONTAINER
> # cat /proc/self/cgroup | grep memory
> 10:memory:/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050
> # cat /proc/self/mountinfo | grep memory
> 801 791 0:42 
> /docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050 
> /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:23 - cgroup 
> cgroup rw,memory
> # cat 
> /sys/fs/cgroup/memory/docker/40ea0ab8eaa0469d8d852b7f1d264b6a451a1c2fe20924cd2de874da5f2e3050/tasks
> cat: 
> 

Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Ioi Lam
On Thu, 19 May 2022 09:06:18 GMT, Severin Gehwolf  wrote:

>> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 63:
>> 
>>> 61:   } else {
>>> 62: char *p = strstr(cgroup_path, _root);
>>> 63: if (p != NULL && p == cgroup_path) {
>> 
>> What happens if cgroup_path is "/foo/bar" and _root is "/fo"?
>> 
>> Maybe this block should be combined with the new `else` block you're adding? 
>> However, the behavior seems opposite between these two blocks of code:
>> 
>> The upper block: _root is a prefix of cgroup_path, we take the **tail** of 
>> cgroup_path
>> 
>> 
>>   TestCase substring_match = {
>> "/sys/fs/cgroup/memory",   // mount_path
>> "/user.slice/user-1000.slice", // root_path
>> "/user.slice/user-1000.slice/user@1001.service",   // cgroup_path
>> "/sys/fs/cgroup/memory/user@1001.service"  // expected_path
>>   };
>> 
>> 
>> The lower block: The beginning of _root is a prefix of cgroup_path, we take 
>> the **head** of cgroup_path
>> 
>> 
>>  TestCase prefix_matched_cg = {
>> "/sys/fs/cgroup/memory",   // mount_path
>> "/user.slice/user-1000.slice/session-50.scope",// root_path
>> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
>> "/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
>>   };
>> 
>> 
>> I find the behavior hard to understand. I think the rules (and reasons) 
>> should be added to the comment block above the function.
>
>> What happens if cgroup_path is "/foo/bar" and _root is "/fo"?
> 
> With a mount path of `/bar` this ends up being `/bar/o/bar`. Pretty strange, 
> but then again it's a bit of a contrived example as those paths come from 
> `/proc` parsing. Anyway, this is code that got added with 
> [JDK-8146115](https://bugs.openjdk.java.net/browse/JDK-8146115). It's not 
> something I've written and to be honest, I'm not sure this branch is needed, 
> but I didn't want to change the existing behaviour with this patch. I have no 
> more insight than you in terms of why that approach has been taken.
> 
>> Maybe this block should be combined with the new `else` block you're adding?
> 
> Maybe, but I'm not sure if it would break something.
> 
>> However, the behavior seems opposite between these two blocks of code:
>> 
>> The upper block: _root is a prefix of cgroup_path, we take the **tail** of 
>> cgroup_path
>> 
>> ```
>>   TestCase substring_match = {
>> "/sys/fs/cgroup/memory",   // mount_path
>> "/user.slice/user-1000.slice", // root_path
>> "/user.slice/user-1000.slice/user@1001.service",   // cgroup_path
>> "/sys/fs/cgroup/memory/user@1001.service"  // expected_path
>>   };
>> ```
> 
> Yes. Though, I cannot comment on why that has been chosen. It's been there 
> since day one :-/
> 
>> The lower block: The beginning of _root is a prefix of cgroup_path, we take 
>> the **head** of cgroup_path
>> 
>> ```
>>  TestCase prefix_matched_cg = {
>> "/sys/fs/cgroup/memory",   // mount_path
>> "/user.slice/user-1000.slice/session-50.scope",// root_path
>> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
>> "/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
>>   };
>> ```
>> 
>> I find the behavior hard to understand. I think the rules (and reasons) 
>> should be added to the comment block above the function.
> 
> The reason why I've gone down the path of adding the head of cgroup_path is 
> because of this document (in conjunction to what the user was observing on an 
> affected system):
> https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
> 
> The user was observing paths as listed in the test:
> 
> "/user.slice/user-1000.slice/session-50.scope",// root_path
> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
> 
> This very much looks like systemd managed. Given that and knowing that 
> systemd adds processes into `scopes` or `services` and groups them via 
> `slices` and also knowing that cgroups are hierarchical (i.e. limits of 
> `/foo/bar` also apply to `/foo/bar/baz`), it seems likely that if there are 
> any limits, then it'll be on `/user.slice/user-1000.slice` within the mounted 
> controller. Unfortunately, I'm not able to reproduce this myself.

I am wondering if the problem is this:

We have systemd running on the host, and a different copy of systemd that runs 
inside the container.

- They both set up `/user.slice/user-1000.slice/session-??.scope` within their 
own file systems
- For some reason, when you're looking inside the container, 
`/proc/self/cgroup` might use a path in the containerized file system whereas 
`/proc/self/mountinfo` uses a path in the host file system. These two paths may 
look alike but they have absolutely no relation to each other.

I have asked the reporter for more information:


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Severin Gehwolf
On Thu, 19 May 2022 05:58:31 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with two 
>> additional commits since the last revision:
>> 
>>  - Refactor hotspot gtest
>>  - Separate into function. Fix comment.
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 102:
> 
>> 100:   for (int i = 0; *s1 == *s2 && *s1 != 0; i++) {
>> 101: if (*s1 == '/') {
>> 102:   last_matching_slash_pos = i;
> 
> I found the behavior a little hard to understand. Is it intentional?
> 
> 
> "/cat/cow", "/cat/dog"-> "/cat/"
> "/cat/","/cat/dog"-> "/cat/"
> "/cat", "/cat/dog"-> "/"
> 
> 
> The name `find_last_slash_pos` doesn't properly describe the behavior. I 
> thought about `find_common_path_prefix`, but that doesn't match the current 
> behavior (for the 3rd case, the common path prefix is `"/cat"`).

It's supposed to find the common path prefix as you say. I'll fix it. Open to 
suggestions to make it easier to understand.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Severin Gehwolf
On Thu, 19 May 2022 06:00:06 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with two 
>> additional commits since the last revision:
>> 
>>  - Refactor hotspot gtest
>>  - Separate into function. Fix comment.
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 86:
> 
>> 84:   const char* cgroup_p = cgroup_path;
>> 85:   int last_slash = find_last_slash_pos(root_p, cgroup_p);
>> 86:   assert(last_slash >= 0, "not an absolute path?");
> 
> Are root_p and cgroup_p directly read from the /proc/xxx files. If so, do we 
> validate the input to make sure they are absolute paths?
> 
> It seems like our code cannot handle trailing '/' in the input. If so, we 
> should clear all trailing '/' from the input string. Then, in functions that 
> process them, we should assert that they don't end with slash. See my comment 
> in find_last_slash_pos().

Yes, those values come from `/proc/self/mountinfo` and `/proc/self/cgroup`. 
There is no validation being done. Then again, we only end up in this branch if 
the root path is not a substring of the cgroup path. In that case trailing 
slashes don't matter, since there would not be a character by character match 
earlier.

I'll add handling of trailing slashes and appropriate asserts where it makes 
sense.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Severin Gehwolf
On Thu, 19 May 2022 05:53:19 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with two 
>> additional commits since the last revision:
>> 
>>  - Refactor hotspot gtest
>>  - Separate into function. Fix comment.
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 60:
> 
>> 58: strncpy(buf, _mount_point, MAXPATHLEN);
>> 59: buf[MAXPATHLEN-1] = '\0';
>> 60: _path = os::strdup(buf);
> 
> Comment on the old code: why this cannot be simply
> 
> 
> _path = os::strdup(_mount_point);
> 
> 
> Also, all the strncat calls in this function seem problematic, and should be 
> converted to stringStream for consistency.

Agreed. I've filed https://bugs.openjdk.java.net/browse/JDK-8287007 to track 
this. I'd like to limit the changes of this patch to a minimum. It seems 
orthogonal.

> What happens if cgroup_path is "/foo/bar" and _root is "/fo"?

With a mount path of `/bar` this ends up being `/bar/o/bar`. Pretty strange, 
but then again it's a bit of a contrived example as those paths come from 
`/proc` parsing. Anyway, this is code that got added with 
[JDK-8146115](https://bugs.openjdk.java.net/browse/JDK-8146115). It's not 
something I've written and to be honest, I'm not sure this branch is needed, 
but I didn't want to change the existing behaviour with this patch. I have no 
more insight than you in terms of why that approach has been taken.

> Maybe this block should be combined with the new `else` block you're adding?

Maybe, but I'm not sure if it would break something.

> However, the behavior seems opposite between these two blocks of code:
> 
> The upper block: _root is a prefix of cgroup_path, we take the **tail** of 
> cgroup_path
> 
> ```
>   TestCase substring_match = {
> "/sys/fs/cgroup/memory",   // mount_path
> "/user.slice/user-1000.slice", // root_path
> "/user.slice/user-1000.slice/user@1001.service",   // cgroup_path
> "/sys/fs/cgroup/memory/user@1001.service"  // expected_path
>   };
> ```

Yes. Though, I cannot comment on why that has been chosen. It's been there 
since day one :-/

> The lower block: The beginning of _root is a prefix of cgroup_path, we take 
> the **head** of cgroup_path
> 
> ```
>  TestCase prefix_matched_cg = {
> "/sys/fs/cgroup/memory",   // mount_path
> "/user.slice/user-1000.slice/session-50.scope",// root_path
> "/user.slice/user-1000.slice/session-3.scope", // cgroup_path
> "/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
>   };
> ```
> 
> I find the behavior hard to understand. I think the rules (and reasons) 
> should be added to the comment block above the function.

The reason why I've gone down the path of adding the head of cgroup_path is 
because of this document (in conjunction to what the user was observing on an 
affected system):
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

The user was observing paths as listed in the test:

"/user.slice/user-1000.slice/session-50.scope",// root_path
"/user.slice/user-1000.slice/session-3.scope", // cgroup_path

This very much looks like systemd managed. Given that and knowing that systemd 
adds processes into `scopes` or `services` and groups them via `slices` and 
also knowing that cgroups are hierarchical (i.e. limits of `/foo/bar` also 
apply to `/foo/bar/baz`), it seems likely that if there are any limits, then 
it'll be on `/user.slice/user-1000.slice` within the mounted controller. 
Unfortunately, I'm not able to reproduce this myself.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-19 Thread Severin Gehwolf
On Thu, 19 May 2022 07:02:13 GMT, Thomas Stuefe  wrote:

>> I'm not convinced the extra function makes the code more readable, but here 
>> it is. I can revert back if this is too much.
>
> @jerboaa Feel free to go back to your original variant. This was only a 
> proposal.

understood.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-19 Thread Thomas Stuefe
On Wed, 18 May 2022 18:10:45 GMT, Severin Gehwolf  wrote:

>> @iklam yes I meant `Find the longest common prefix`. Fixed the comment.
>
> I'm not convinced the extra function makes the code more readable, but here 
> it is. I can revert back if this is too much.

@jerboaa Feel free to go back to your original variant. This was only a 
proposal.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-19 Thread Ioi Lam
On Wed, 18 May 2022 18:14:52 GMT, Severin Gehwolf  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Refactor hotspot gtest
>  - Separate into function. Fix comment.

Changes requested by iklam (Reviewer).

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 60:

> 58: strncpy(buf, _mount_point, MAXPATHLEN);
> 59: buf[MAXPATHLEN-1] = '\0';
> 60: _path = os::strdup(buf);

Comment on the old code: why this cannot be simply


_path = os::strdup(_mount_point);


Also, all the strncat calls in this function seem problematic, and should be 
converted to stringStream for consistency.

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 63:

> 61:   } else {
> 62: char *p = strstr(cgroup_path, _root);
> 63: if (p != NULL && p == cgroup_path) {

What happens if cgroup_path is "/foo/bar" and _root is "/fo"?

Maybe this block should be combined with the new `else` block you're adding? 
However, the behavior seems opposite between these two blocks of code:

The upper block: _root is a prefix of cgroup_path, we take the **tail** of 
cgroup_path


  TestCase substring_match = {
"/sys/fs/cgroup/memory",   // mount_path
"/user.slice/user-1000.slice", // root_path
"/user.slice/user-1000.slice/user@1001.service",   // cgroup_path
"/sys/fs/cgroup/memory/user@1001.service"  // expected_path
  };


The lower block: The beginning of _root is a prefix of cgroup_path, we take the 
**head** of cgroup_path


 TestCase prefix_matched_cg = {
"/sys/fs/cgroup/memory",   // mount_path
"/user.slice/user-1000.slice/session-50.scope",// root_path
"/user.slice/user-1000.slice/session-3.scope", // cgroup_path
"/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
  };


I find the behavior hard to understand. I think the rules (and reasons) should 
be added to the comment block above the function.

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 86:

> 84:   const char* cgroup_p = cgroup_path;
> 85:   int last_slash = find_last_slash_pos(root_p, cgroup_p);
> 86:   assert(last_slash >= 0, "not an absolute path?");

Are root_p and cgroup_p directly read from the /proc/xxx files. If so, do we 
validate the input to make sure they are absolute paths?

It seems like our code cannot handle trailing '/' in the input. If so, we 
should clear all trailing '/' from the input string. Then, in functions that 
process them, we should assert that they don't end with slash. See my comment 
in find_last_slash_pos().

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 102:

> 100:   for (int i = 0; *s1 == *s2 && *s1 != 0; i++) {
> 101: if (*s1 == '/') {
> 102:   last_matching_slash_pos = i;

I found the behavior a little hard to understand. Is it intentional?


"/cat/cow", "/cat/dog"-> "/cat/"
"/cat/","/cat/dog"-> "/cat/"
"/cat", "/cat/dog"-> "/"


The name `find_last_slash_pos` doesn't properly describe the behavior. I 
thought about `find_common_path_prefix`, but that doesn't match the current 
behavior (for the 3rd case, the common path prefix is `"/cat"`).

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-18 Thread Severin Gehwolf
On Tue, 17 May 2022 06:18:25 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Use stringStream to simplify controller path assembly
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:
> 
>> 90:   }
>> 91:   ss.print_raw(_root, last_matching_slash_pos);
>> 92:   _path = os::strdup(ss.base());
> 
> Do you mean `Find the longest common prefix`? Maybe give an example in the 
> comments? Text parsing in C code is really difficult to understand.

@iklam yes I meant `Find the longest common prefix`. Fixed the comment.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-18 Thread Severin Gehwolf
On Wed, 18 May 2022 18:09:54 GMT, Severin Gehwolf  wrote:

>> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:
>> 
>>> 90:   }
>>> 91:   ss.print_raw(_root, last_matching_slash_pos);
>>> 92:   _path = os::strdup(ss.base());
>> 
>> Do you mean `Find the longest common prefix`? Maybe give an example in the 
>> comments? Text parsing in C code is really difficult to understand.
>
> @iklam yes I meant `Find the longest common prefix`. Fixed the comment.

I'm not convinced the extra function makes the code more readable, but here it 
is. I can revert back if this is too much.

>> test/hotspot/gtest/runtime/test_os_linux_cgroups.cpp line 63:
>> 
>>> 61: ASSERT_STREQ(expected_cg_paths[i], ctrl->subsystem_path());
>>> 62:   }
>>> 63: }
>> 
>> I found it hard to relate the different paths. Could you create a new struct 
>> like this?
>> 
>> 
>> struct TestCase {
>> char* mount_path;
>> char* root_paths;
>> char* cgroup_path;
>> char* expected_cg_paths;
>> } = {
>>   {  "/sys/fs/cgroup/memory", // mount
>>"/",   // root,
>>
>
> Yes, makes sense. Will do.

Done now.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]

2022-05-18 Thread Severin Gehwolf
> Please review this change to the cgroup v1 subsystem which makes it more 
> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
> re-create a similar system as the reporter. The idea of using the longest 
> substring match for the cgroupv1 file paths was based on the fact that on 
> systemd systems processes run in separate scopes and the maven forked test 
> runner might exhibit this property. For that it makes sense to use the common 
> ancestor path. Nothing changes in the common cases where the `cgroup_path` 
> matches `_root` and when the `_root` is `/` (container the former, host 
> system the latter).
> 
> In addition, the code paths which are susceptible to throw NPE have been 
> hardened to catch those situations. Should it happen in the future it makes 
> more sense (to me) to not have accurate container detection in favor of 
> continuing to keep running.
> 
> Finally, with the added unit-tests a bug was uncovered on the "substring" 
> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
> point to `_root` as it's used as the "needle" to find in "haystack" 
> `cgroup_path` (not the other way round).
> 
> Testing:
> - [x] Added unit tests
> - [x] GHA
> - [x] Container tests on cgroups v1 Linux. Continue to pass

Severin Gehwolf has updated the pull request incrementally with two additional 
commits since the last revision:

 - Refactor hotspot gtest
 - Separate into function. Fix comment.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8629/files
  - new: https://git.openjdk.java.net/jdk/pull/8629/files/66241aa5..4b8e92fa

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8629=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8629=01-02

  Stats: 93 lines in 3 files changed: 56 ins; 25 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8629.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8629/head:pull/8629

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-17 Thread Severin Gehwolf
On Tue, 17 May 2022 07:14:34 GMT, Thomas Stuefe  wrote:

>> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:
>> 
>>> 90:   }
>>> 91:   ss.print_raw(_root, last_matching_slash_pos);
>>> 92:   _path = os::strdup(ss.base());
>> 
>> Do you mean `Find the longest common prefix`? Maybe give an example in the 
>> comments? Text parsing in C code is really difficult to understand.
>
> Maybe factor out the search like this
> 
> // Return length of common starting substring. E.g. "cat" for ("cattle", 
> "catnip");
> static int find_common_starting_substring(const char* s1, const char* s2) {
> ...
> }
> 
> That way its clearer and we can find and move this to utilities if we ever 
> need this in other places.

OK.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-17 Thread Severin Gehwolf
On Tue, 17 May 2022 05:55:47 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Use stringStream to simplify controller path assembly
>
> test/hotspot/gtest/runtime/test_os_linux_cgroups.cpp line 63:
> 
>> 61: ASSERT_STREQ(expected_cg_paths[i], ctrl->subsystem_path());
>> 62:   }
>> 63: }
> 
> I found it hard to relate the different paths. Could you create a new struct 
> like this?
> 
> 
> struct TestCase {
> char* mount_path;
> char* root_paths;
> char* cgroup_path;
> char* expected_cg_paths;
> } = {
>   {  "/sys/fs/cgroup/memory", // mount
>"/",   // root,
>

Yes, makes sense. Will do.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 06:18:25 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Use stringStream to simplify controller path assembly
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:
> 
>> 90:   }
>> 91:   ss.print_raw(_root, last_matching_slash_pos);
>> 92:   _path = os::strdup(ss.base());
> 
> Do you mean `Find the longest common prefix`? Maybe give an example in the 
> comments? Text parsing in C code is really difficult to understand.

Maybe factor out the search like this

// Return length of common starting substring. E.g. "cat" for ("cattle", 
"catnip");
static int find_common_starting_substring(const char* s1, const char* s2) {
...
}

That way its clearer and we can find and move this to utilities if we ever need 
this in other places.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-17 Thread Ioi Lam
On Thu, 12 May 2022 18:09:40 GMT, Severin Gehwolf  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Use stringStream to simplify controller path assembly

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:

> 90:   }
> 91:   ss.print_raw(_root, last_matching_slash_pos);
> 92:   _path = os::strdup(ss.base());

Do you mean `Find the longest common prefix`? Maybe give an example in the 
comments? Text parsing in C code is really difficult to understand.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-16 Thread Ioi Lam
On Thu, 12 May 2022 18:09:40 GMT, Severin Gehwolf  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Use stringStream to simplify controller path assembly

test/hotspot/gtest/runtime/test_os_linux_cgroups.cpp line 63:

> 61: ASSERT_STREQ(expected_cg_paths[i], ctrl->subsystem_path());
> 62:   }
> 63: }

I found it hard to relate the different paths. Could you create a new struct 
like this?


struct TestCase {
char* mount_path;
char* root_paths;
char* cgroup_path;
char* expected_cg_paths;
} = {
  {  "/sys/fs/cgroup/memory", // mount
   "/",   // root,
   

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-13 Thread Thomas Stuefe
On Thu, 12 May 2022 18:09:40 GMT, Severin Gehwolf  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Use stringStream to simplify controller path assembly

Thanks for taking my suggestion! Much simpler now :-)

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-12 Thread Severin Gehwolf
On Thu, 12 May 2022 15:58:57 GMT, Severin Gehwolf  wrote:

>> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 113:
>> 
>>> 111:   }
>>> 112:   buf[MAXPATHLEN-1] = '\0';
>>> 113:   _path = os::strdup(buf);
>> 
>> I think this code can be simplified a lot with stringStream and without 
>> strtok, so no need for fixed buffers (which may fail with longer path names) 
>> and no need for writable string copies on the stack.
>> 
>> Something like this:
>> 
>> stringStream ss;
>> ss.print_raw(_mount_point);
>> const char* p1 = _root;
>> const char* p2 = cgroup_path;
>> int last_matching_dash_pos = -1;
>> for (int i = 0; *p1 == *p2 && *p1 != 0; i ++) {
>>  if (*p1 == '/') {
>>  last_matching_dash_pos = i;
>>  }
>> p1++; p2++;
>> }
>> ss.print_raw(_root, last_matching_dash_pos);
>> // Now use ss.base() to access the assembled string
>
> Nice, thanks! I'll update it.

Done.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-12 Thread Severin Gehwolf
> Please review this change to the cgroup v1 subsystem which makes it more 
> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
> re-create a similar system as the reporter. The idea of using the longest 
> substring match for the cgroupv1 file paths was based on the fact that on 
> systemd systems processes run in separate scopes and the maven forked test 
> runner might exhibit this property. For that it makes sense to use the common 
> ancestor path. Nothing changes in the common cases where the `cgroup_path` 
> matches `_root` and when the `_root` is `/` (container the former, host 
> system the latter).
> 
> In addition, the code paths which are susceptible to throw NPE have been 
> hardened to catch those situations. Should it happen in the future it makes 
> more sense (to me) to not have accurate container detection in favor of 
> continuing to keep running.
> 
> Finally, with the added unit-tests a bug was uncovered on the "substring" 
> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
> point to `_root` as it's used as the "needle" to find in "haystack" 
> `cgroup_path` (not the other way round).
> 
> Testing:
> - [x] Added unit tests
> - [x] GHA
> - [x] Container tests on cgroups v1 Linux. Continue to pass

Severin Gehwolf has updated the pull request incrementally with one additional 
commit since the last revision:

  Use stringStream to simplify controller path assembly

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8629/files
  - new: https://git.openjdk.java.net/jdk/pull/8629/files/bc873b3f..66241aa5

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8629=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8629=00-01

  Stats: 32 lines in 1 file changed: 0 ins; 21 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8629.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8629/head:pull/8629

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Severin Gehwolf
On Thu, 12 May 2022 14:00:44 GMT, Thomas Stuefe  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 113:
> 
>> 111:   }
>> 112:   buf[MAXPATHLEN-1] = '\0';
>> 113:   _path = os::strdup(buf);
> 
> I think this code can be simplified a lot with stringStream and without 
> strtok, so no need for fixed buffers (which may fail with longer path names) 
> and no need for writable string copies on the stack.
> 
> Something like this:
> 
> stringStream ss;
> ss.print_raw(_mount_point);
> const char* p1 = _root;
> const char* p2 = cgroup_path;
> int last_matching_dash_pos = -1;
> for (int i = 0; *p1 == *p2 && *p1 != 0; i ++) {
>   if (*p1 == '/') {
>   last_matching_dash_pos = i;
>   }
> p1++; p2++;
> }
> ss.print_raw(_root, last_matching_dash_pos);
> // Now use ss.base() to access the assembled string

Nice, thanks! I'll update it.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Thomas Stuefe
On Tue, 10 May 2022 12:29:10 GMT, Severin Gehwolf  wrote:

> Please review this change to the cgroup v1 subsystem which makes it more 
> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
> re-create a similar system as the reporter. The idea of using the longest 
> substring match for the cgroupv1 file paths was based on the fact that on 
> systemd systems processes run in separate scopes and the maven forked test 
> runner might exhibit this property. For that it makes sense to use the common 
> ancestor path. Nothing changes in the common cases where the `cgroup_path` 
> matches `_root` and when the `_root` is `/` (container the former, host 
> system the latter).
> 
> In addition, the code paths which are susceptible to throw NPE have been 
> hardened to catch those situations. Should it happen in the future it makes 
> more sense (to me) to not have accurate container detection in favor of 
> continuing to keep running.
> 
> Finally, with the added unit-tests a bug was uncovered on the "substring" 
> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
> point to `_root` as it's used as the "needle" to find in "haystack" 
> `cgroup_path` (not the other way round).
> 
> Testing:
> - [x] Added unit tests
> - [x] GHA
> - [x] Container tests on cgroups v1 Linux. Continue to pass

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 113:

> 111:   }
> 112:   buf[MAXPATHLEN-1] = '\0';
> 113:   _path = os::strdup(buf);

I think this code can be simplified a lot with stringStream and without strtok, 
so no need for fixed buffers (which may fail with longer path names) and no 
need for writable string copies on the stack.

Something like this:

stringStream ss;
ss.print_raw(_mount_point);
const char* p1 = _root;
const char* p2 = cgroup_path;
int last_matching_dash_pos = -1;
for (int i = 0; *p1 == *p2 && *p1 != 0; i ++) {
if (*p1 == '/') {
last_matching_dash_pos = i;
}
p1++; p2++;
}
ss.print_raw(_root, last_matching_dash_pos);
// Now use ss.base() to access the assembled string

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Severin Gehwolf
On Thu, 12 May 2022 06:13:47 GMT, Ioi Lam  wrote:

> I just started to look at the code so just one comment for now.

Sure, thanks for the review!

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Severin Gehwolf
On Thu, 12 May 2022 06:11:03 GMT, Ioi Lam  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java
>  line 65:
> 
>> 63: path = mountPoint + cgroupSubstr;
>> 64: }
>> 65: } else {
> 
> Looks like `path` is still not set if the condition at line 61 `if 
> (cgroupPath.length() > root.length()) {` is false.

Yes. There are more cases where the cgroup path might be null. If you know of 
cases that should be handled and aren't, please do let me know. Incidentally, 
that's why I've added the catch for NPE in `CgroupV1Subsystem.initSubSystem()`, 
because it's not clear what should be used as `path` in those corner cases. It 
certainly shouldn't throw a NPE :-). It then also more closely matches what 
hotspot does. Having said that, if `cgroupPath.length < root.length()` it's 
implied that `cgroupPath.startsWith(root)` is false. Then the only case where 
it would still be null is when the paths match in lenght, but aren't the same.

I'm not convinced the logic when then cgroup patch starts with the root path, 
then it should do what it does today. I.e. given 
`mountpath=/sys/fs/cgroup/memory`, `root=/foo/bar` and 
`cgroupPath=/foo/bar/baz` then `path=/sys/fs/cgroup/memory/baz`. I've 
personally not seen such a setup and the code predates me. Considering that the 
equivalent hotspot code had a bug in this logic since day 1, it doesn't seem 
very widely used...

The most common cases I know to date are:

1. `root=/`, `cgroupPath=/some`, `mount=/sys/fs/cgroup/controller` => 
`path=/sys/fs/cgroup/controller/some` (host systems)
2. `root=/foo/bar/baz`, `cgroupPath=/foo/bar/baz`, 
`mount=/sys/fs/cgroup/controller` => `path=/sys/fs/cgroup/controller` (most 
container engines)

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Ioi Lam
On Tue, 10 May 2022 12:29:10 GMT, Severin Gehwolf  wrote:

> Please review this change to the cgroup v1 subsystem which makes it more 
> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
> re-create a similar system as the reporter. The idea of using the longest 
> substring match for the cgroupv1 file paths was based on the fact that on 
> systemd systems processes run in separate scopes and the maven forked test 
> runner might exhibit this property. For that it makes sense to use the common 
> ancestor path. Nothing changes in the common cases where the `cgroup_path` 
> matches `_root` and when the `_root` is `/` (container the former, host 
> system the latter).
> 
> In addition, the code paths which are susceptible to throw NPE have been 
> hardened to catch those situations. Should it happen in the future it makes 
> more sense (to me) to not have accurate container detection in favor of 
> continuing to keep running.
> 
> Finally, with the added unit-tests a bug was uncovered on the "substring" 
> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
> point to `_root` as it's used as the "needle" to find in "haystack" 
> `cgroup_path` (not the other way round).
> 
> Testing:
> - [x] Added unit tests
> - [x] GHA
> - [x] Container tests on cgroups v1 Linux. Continue to pass

I just started to look at the code so just one comment for now.

src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java
 line 65:

> 63: path = mountPoint + cgroupSubstr;
> 64: }
> 65: } else {

Looks like `path` is still not set if the condition at line 61 `if 
(cgroupPath.length() > root.length()) {` is false.

-

PR: https://git.openjdk.java.net/jdk/pull/8629