Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Severin Gehwolf
On Thu, 28 Jan 2021 17:19:53 GMT, Poonam Bajaj  wrote:

> > > I'm curious: What config is this to actually trigger the NPE? How does 
> > > `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
> > 
> > 
> > > > I'm curious: What config is this to actually trigger the NPE? How does 
> > > > `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look 
> > > > like?
> > > 
> > > 
> > > I don't have access to the config. The issue was reported by a customer.
> > 
> > 
> > This isn't very satisfying, though. How can we be sure this issue isn't 
> > also present in the cgroup v2 code? Has this been tested? Surely, there was 
> > some stack trace reported by the customer or some sort of reproducer got 
> > provided. What was the reasoning that established this issue is present in 
> > JDK head and **only** in cgroups v1 code? My guess is that the issue got 
> > triggered via the OperatingSystemMXBean, but nothing to that effect has 
> > been noted here or in the bug.
> > If I were to propose such a point fix, clearly, I'd have to provide some 
> > details what the actual problem is and explain why the fix is sufficient 
> > and covers all branches. All that got provided is: "But memory could be 
> > Null on some machines that have cgroup entries for CPU but not for memory."
> 
> I can check with the customer if they could share their config.

That would be helpful. A stack trace of the NPE would be good too.

> The cgroups v2 code was thoroughly examined and this problem does not exist 
> in that code. cgroups v2 does not have a separate MemorySubSystemController 
> as we have for c1 cgroups.
> 
> An instance of the CgroupV1MemorySubSystemController is stored as a member in 
> CgroupV1Subsystem.
> 
> ```
> private CgroupV1MemorySubSystemController memory;
> 
> private void setMemorySubSystem(CgroupV1MemorySubSystemController memory) {
>   this.memory = memory;
> }
> ```
> 
> This memory instance variable could stay null if "memory" entry is not found 
> while creating sub-system objects in createSubSystemController().
> 
> ```
>case "memory":
>   subsystem.setMemorySubSystem(new 
> CgroupV1MemorySubSystemController(mountentry[3], mountentry[4]));
>   break;
> ```
> 
> This fix ensure that the memory instance is not null before invoking any 
> method on it.
> 
> Such problem does not exist in cgroups v2 code.

Right, but is this reasoning sound? Without knowing the code path triggering 
the problem **and** knowing something about the config can we really say? 
Depending on the config it might fail in some very different way on cgroups v2!

On the other hand, knowing the config, might allow us to conclude it's an 
impossible config on cgroups v2 and we are done.

-

PR: https://git.openjdk.java.net/jdk/pull/2269


Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Poonam Bajaj
On Thu, 28 Jan 2021 16:00:36 GMT, Severin Gehwolf  wrote:

> > I'm curious: What config is this to actually trigger the NPE? How does 
> > `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
> 
> > > I'm curious: What config is this to actually trigger the NPE? How does 
> > > `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
> > 
> > 
> > I don't have access to the config. The issue was reported by a customer.
> 
> This isn't very satisfying, though. How can we be sure this issue isn't also 
> present in the cgroup v2 code? Has this been tested? Surely, there was some 
> stack trace reported by the customer or some sort of reproducer got provided. 
> What was the reasoning that established this issue is present in JDK head and 
> **only** in cgroups v1 code? My guess is that the issue got triggered via the 
> OperatingSystemMXBean, but nothing to that effect has been noted here or in 
> the bug.
> 
> If I were to propose such a point fix, clearly, I'd have to provide some 
> details what the actual problem is and explain why the fix is sufficient and 
> covers all branches. All that got provided is: "But memory could be Null on 
> some machines that have cgroup entries for CPU but not for memory."

I can check with the customer if they could share their config. 

The cgroups v2 code was thoroughly examined and this problem does not exist in 
that code. cgroups v2 does not have a separate MemorySubSystemController as we 
have for c1 cgroups. 

An instance of the CgroupV1MemorySubSystemController is stored as a member in 
CgroupV1Subsystem.

private CgroupV1MemorySubSystemController memory;

private void setMemorySubSystem(CgroupV1MemorySubSystemController memory) {
  this.memory = memory;
}

This memory instance variable could stay null if "memory" entry is not found 
while creating sub-system objects in createSubSystemController(). 

   case "memory":
  subsystem.setMemorySubSystem(new 
CgroupV1MemorySubSystemController(mountentry[3], mountentry[4]));
  break;

This fix ensure that the memory instance is not null before invoking any method 
on it.

Such problem does not exist in cgroups v2 code.

-

PR: https://git.openjdk.java.net/jdk/pull/2269


Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Severin Gehwolf
On Thu, 28 Jan 2021 15:01:11 GMT, Poonam Bajaj  wrote:

>> I'm curious: What config is this to actually trigger the NPE? How does 
>> `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
>
>> I'm curious: What config is this to actually trigger the NPE? How does 
>> `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
> 
> I don't have access to the config. The issue was reported by a customer.

> I'm curious: What config is this to actually trigger the NPE? How does 
> `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?



> > I'm curious: What config is this to actually trigger the NPE? How does 
> > `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?
> 
> I don't have access to the config. The issue was reported by a customer.

This isn't very satisfying, though. How can we be sure this issue isn't also 
present in the cgroup v2 code? Has this been tested? Surely, there was some 
stack trace reported by the customer or some sort of reproducer got provided. 
What was the reasoning that established this issue is present in JDK head and 
**only** in cgroups v1 code? My guess is that the issue got triggered via the 
OperatingSystemMXBean, but nothing to that effect has been noted here or in the 
bug.

If I were to propose such a point fix, clearly, I'd have to provide some 
details what the actual problem is and explain why the fix is sufficient and 
covers all branches. All that got provided is: "But memory could be Null on 
some machines that have cgroup entries for CPU but not for memory."

-

PR: https://git.openjdk.java.net/jdk/pull/2269


Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Poonam Bajaj
On Thu, 28 Jan 2021 14:36:36 GMT, Severin Gehwolf  wrote:

> I'm curious: What config is this to actually trigger the NPE? How does 
> `/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?

I don't have access to the config. The issue was reported by a customer.

-

PR: https://git.openjdk.java.net/jdk/pull/2269


Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Harold Seigel
On Wed, 27 Jan 2021 21:10:16 GMT, Poonam Bajaj  wrote:

> Please review this simple change that adds null checks for memory in 
> CgroupV1Subsystem.java.
> 
> Problem: After the backport of JDK-8250984,  there are places where 
> memory.isSwapEnabled() is called. For example:
> 
> public long getMemoryAndSwapFailCount() {
> if (!memory.isSwapEnabled()) {
> return getMemoryFailCount();
> }
> return SubSystem.getLongValue(memory, "memory.memsw.failcnt");
> }
>  
> But memory could be Null on some machines that have cgroup entries for CPU 
> but not for memory. This would cause a NullPointerException when memory is 
> accessed.
> 
> Fix: Add null checks for memory.

Changes look good!  Thanks for doing this.

Harold

-

Marked as reviewed by hseigel (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2269


Re: RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-28 Thread Severin Gehwolf
On Wed, 27 Jan 2021 21:10:16 GMT, Poonam Bajaj  wrote:

> Please review this simple change that adds null checks for memory in 
> CgroupV1Subsystem.java.
> 
> Problem: After the backport of JDK-8250984,  there are places where 
> memory.isSwapEnabled() is called. For example:
> 
> public long getMemoryAndSwapFailCount() {
> if (!memory.isSwapEnabled()) {
> return getMemoryFailCount();
> }
> return SubSystem.getLongValue(memory, "memory.memsw.failcnt");
> }
>  
> But memory could be Null on some machines that have cgroup entries for CPU 
> but not for memory. This would cause a NullPointerException when memory is 
> accessed.
> 
> Fix: Add null checks for memory.

I'm curious: What config is this to actually trigger the NPE? How does 
`/proc/self/mountinfo`, `/proc/self/cgroup` and `/proc/cgroups` look like?

-

PR: https://git.openjdk.java.net/jdk/pull/2269


RFR: 8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

2021-01-27 Thread Poonam Bajaj
Please review this simple change that adds null checks for memory in 
CgroupV1Subsystem.java.

Problem: After the backport of JDK-8250984,  there are places where 
memory.isSwapEnabled() is called. For example:

public long getMemoryAndSwapFailCount() {
if (!memory.isSwapEnabled()) {
return getMemoryFailCount();
}
return SubSystem.getLongValue(memory, "memory.memsw.failcnt");
}
 
But memory could be Null on some machines that have cgroup entries for CPU but 
not for memory. This would cause a NullPointerException when memory is accessed.

Fix: Add null checks for memory.

-

Commit messages:
 - 8257746: Regression introduced with JDK-8250984 - memory might be null in 
some machines

Changes: https://git.openjdk.java.net/jdk/pull/2269/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=2269=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8257746
  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2269.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2269/head:pull/2269

PR: https://git.openjdk.java.net/jdk/pull/2269