Hi Serguei,

Thank you for reviewing this change.

Just wanted to add that the only "volatile" metrics are "usage" ones ( 
memoryUsage and  
memoryAndSwapLimit). The  "limit" metrics (memoryLimit and memoryAndSwapLimit) 
are set 
when the container starts and  are not subjects to change. The only method that 
reads  more than one
 "volatile" metric is getFreeSwapSpaceSize() and it has a code that retries  if 
the calculated swapUsage
is negative as a result of non-atomic reads.


Thank you,
Daniil

On 12/11/19, 3:13 PM, "serguei.spit...@oracle.com" 
<serguei.spit...@oracle.com> wrote:

    Hi Daniil,
    
    One my concerns was a non-atomic read of multiple metrics before comparison.
    It creates a potential to get a mismatch in result.
    However, the probability to get a negative value is pretty low, I think.
    The other concern (if incorrect metrics are returned) is covered by 
    JDK-8235522.
    Revising all concerns in JDK-8235522 sounds good to me.
    
    Thanks,
    Serguei
    
    On 12/10/19 10:29, Daniil Titov wrote:
    > Hi Serguei,
    >
    >>        Do we need to check if the (limit - memLimit) value is negative?
    >>        The same question is for getFreeSwapSpaceSize():
    >>          memSwapLimit - memLimit - (memSwapUsage - memUsage)
    >>    
    >>        and getFreeMemorySize():
    >>         101 return limit - usage;
    > I don't think we need such check here. If it happens in fact it means the 
serious system malfunction and a negative value this method
    > returns would indicate this (currently the native methods already returns 
-1 if something went wrong).  But we could revise it in the follow
    >   up issue I created for that [1].
    >
    > [1] https://bugs.openjdk.java.net/browse/JDK-8235522
    >
    > Thank you,
    > Daniil
    >
    > On 12/9/19, 6:02 PM, "serguei.spit...@oracle.com" 
<serguei.spit...@oracle.com> wrote:
    >
    >      Hi Daniil,
    >      
    >      It is not a full review, just some minor comments.
    >      In fact, I do not see real problems yet.
    >      
    >      
http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html
    >      
    >         55     public long getTotalSwapSpaceSize() {
    >         56         if (containerMetrics != null) {
    >         57             long limit = 
containerMetrics.getMemoryAndSwapLimit();
    >         58             // The memory limit metrics is not available if JVM
    >      runs on Linux host ( not in a docker container)
    >         59             // or if a docker container was started without
    >      specifying a memory limit ( without '--memory='
    >         60             // Docker option). In latter case there is no 
limit on
    >      how much memory the container can use and
    >         61             // it can use as much memory as the host's OS 
allows.
    >         62             long memLimit = containerMetrics.getMemoryLimit();
    >         63             if (limit >= 0 && memLimit >= 0) {
    >         64                 return limit - memLimit;
    >         65             }
    >         66         }
    >         67         return getTotalSwapSpaceSize0();
    >         68     }
    >      
    >         Unneeded space after brackets '('.
    >         Do we need to check if the (limit - memLimit) value is negative?
    >         The same question is for getFreeSwapSpaceSize():
    >           memSwapLimit - memLimit - (memSwapUsage - memUsage)
    >      
    >         and getFreeMemorySize():
    >           101 return limit - usage;
    >      
    >         81                         // If this happens just retry the loop 
for
    >      a few iterations
    >      
    >         Dot is missed at the end of comment.
    >      
    >      
    >      
http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html
    >      
    >         34 System.out.println(String.format("Runtime.availableProcessors:
    >      %d", Runtime.getRuntime().availableProcessors()));
    >         35
    >      
System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors:
    >      %d", osBean.getAvailableProcessors()));
    >         36
    >      
System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize:
    >      %d", osBean.getTotalMemorySize()));
    >         37
    >      
System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize:
    >      %d", osBean.getTotalPhysicalMemorySize()));
    >         38
    >      
System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize:
    >      %d", osBean.getFreeMemorySize()));
    >         39
    >      
System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize:
    >      %d", osBean.getFreePhysicalMemorySize()));
    >         40
    >      
System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize:
    >      %d", osBean.getTotalSwapSpaceSize()));
    >         41
    >      
System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize:
    >      %d", osBean.getFreeSwapSpaceSize()));
    >         42
    >      System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: 
%f",
    >      osBean.getCpuLoad()));
    >         43
    >      
System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad:
    >      %f", osBean.getSystemCpuLoad()));
    >      
    >      
    >         To make the above lines a little bit shorter I'd suggest to 
define a
    >      log() method like this:
    >            private static void log(String msg) ( System.out.println(msg(; 
}
    >      
    >         34         log(String.format("Runtime.availableProcessors: %d",
    >      Runtime.getRuntime().availableProcessors()));
    >         35 
log(String.format("OperatingSystemMXBean.getAvailableProcessors:
    >      %d", osBean.getAvailableProcessors()));
    >         36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: 
%d",
    >      osBean.getTotalMemorySize()));
    >         37
    >      log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize:
    >      %d", osBean.getTotalPhysicalMemorySize()));
    >         38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: 
%d",
    >      osBean.getFreeMemorySize()));
    >         39
    >      log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: 
%d",
    >      osBean.getFreePhysicalMemorySize()));
    >         40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize:
    >      %d", osBean.getTotalSwapSpaceSize()));
    >         41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize:
    >      %d", osBean.getFreeSwapSpaceSize()));
    >         42         log(String.format("OperatingSystemMXBean.getCpuLoad: 
%f",
    >      osBean.getCpuLoad()));
    >         43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f",
    >      osBean.getSystemCpuLoad()));
    >      
    >      
    >      Thanks,
    >      Serguei
    >      
    >      
    >      
    >      On 12/6/19 17:41, Daniil Titov wrote:
    >      > Hi David, Mandy, and Bob,
    >      >
    >      > Thank you for reviewing this fix.
    >      >
    >      > Please review a new version of the fix [1] that includes the 
following changes comparing to the previous version of the webrev ( webrev.04)
    >      > 1. The changes in Javadoc made in the webrev.04 comparing to 
webrev.03 and to CSR [3] were discarded.
    >      > 2.  The implementation of methods getFreeMemorySize, 
getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize
    >      >       was also reverted to webrev.03 version that return host's 
values if the metrics are unavailable or cannot be properly read.
    >      >       I would like to mention that  currently the native 
implementation of these methods de-facto may return -1 at some circumstances,
    >      >       but I agree that the changes proposed in the previous 
version of the webrev increase such probability.
    >      >       I filed the follow-up issue [4] as Mandy suggested.
    >      > 3.  The legacy methods were renamed as David suggested.
    >      >
    >      >
    >      >> 
src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c
    >      >> !     static int initialized=1;
    >      >>
    >      >>   Am I reading this right that the code currently fails to 
actually do the
    >      >> initialization because of this ???
    >      > Yes, currently the code fails to do the initialization but it was 
unnoticed since method
    >      > get_cpuload_internal(...) was never called for a specific CPU, the 
first parameter "which"
    >      > was always -1.
    >      >
    >      >>   
test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
    >      >>
    >      >> System.out.println(String.format(...)
    >      >>
    >      >> Why not simply
    >      >>
    >      >> System.out.printf(..)
    >      > As I tried explain it earlier it would make the tests unstable.
    >      > System.out.printf(...) just delegates the call to 
System.out.format(...) that doesn't emit the string atomically.
    >      > Instead it parses the format string into a list of FormatString 
objects and then iterates over the list.
    >      > As a result, the other traces occasionally got printed between 
these iterations  and break the pattern the test is expected to find
    >      > in the output.
    >      >
    >      > For example, here is the sample of a such output that has the 
trace message printed between " 
OperatingSystemMXBean.getFreePhysicalMemorySize:"
    >      > and "1030762496".
    >      >
    >      > <skipped>
    >      > [0.304s][trace][os,container] Memory Usage is: 42983424
    >      > OperatingSystemMXBean.getFreeMemorySize: 1030758400
    >      > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is 
/sys/fs/cgroup/memory/memory.usage_in_bytes
    >      > [0.305s][trace][os,container] Memory Usage is: 42979328
    >      > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is 
/sys/fs/cgroup/memory/memory.usage_in_bytes
    >      > OperatingSystemMXBean.getFreePhysicalMemorySize: 
[0.306s][trace][os,container] Memory Usage is: 42975232
    >      > 1030762496
    >      > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176
    >      >
    >      > <skipped>
    >      > java.lang.RuntimeException: 
'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from 
stdout/stderr
    >      >
    >      >    at 
jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306)
    >      >    at 
TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151)
    >      >    at TestMemoryAwareness.main(TestMemoryAwareness.java:73)
    >      >    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    >      >    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    >      >    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    >      >    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    >      >    at 
com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    >      >    at java.base/java.lang.Thread.run(Thread.java:832)
    >      >
    >      > Testing: Mach5 tier1-tier3 and 
open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are 
still running.
    >      >
    >      > [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8226575/webrev.05
    >      > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575
    >      > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428
    >      > [4] https://bugs.openjdk.java.net/browse/JDK-8235522
    >      >
    >      > Thank you,
    >      > Daniil
    >      >
    >      > On 12/6/19, 1:38 PM, "Mandy Chung" <mandy.ch...@oracle.com> wrote:
    >      >
    >      >
    >      >
    >      >      On 12/6/19 5:59 AM, Bob Vandette wrote:
    >      >      >> On Dec 6, 2019, at 2:49 AM, David 
Holmes<david.hol...@oracle.com>  wrote:
    >      >      >>
    >      >      >>
    >      >      >> 
src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java
    >      >      >>
    >      >      >> The changes to allow for a return of -1 are somewhat more 
extensive than we have previously discussed. These methods previously were (per 
the spec) guaranteed to return some (assumably) meaningful value but now they 
are effectively allowed to fail by returning -1. No existing code is expecting 
to have to handle a return of -1 so I see this as a significant compatibility 
issue.
    >      >
    >      >      I thought that the error case we are referring to is limit == 
0 which
    >      >      indicates something unexpected goes wrong.  So the 
compatibility concern
    >      >      should be low.  This is very specific to Metrics 
implementation for
    >      >      cgroup v1 and let me know if I'm wrong.
    >      >
    >      >      >> Surely there must always be some information available 
from the operating environment? I see from the impl file:
    >      >      >>
    >      >      >>     // the host data, value 0 indicates that something 
went wrong while the metric was read and
    >      >      >>    // in this case we return "information unavailable" 
code -1.
    >      >      >>
    >      >      >> I don't agree with this. If the container metrics are 
messed up somehow we should either fallback to the host value or else abort 
with some kind of exception. Returning -1 is not an option here IMO.
    >      >      > I agree with David on the compatibility concern.  I 
originally thought that -1 was already a specified return for all of these 
methods.
    >      >      > Since the 0 return failure from the Metrics API should only 
occur if one of the cgroup subsystems is not enabled while others
    >      >      > are, I’d suggest we keep Daniil’s original logic to fall 
back to the host value since a disabled subsystem is equivalent to no
    >      >      > limits.
    >      >      >
    >      >
    >      >      It's important to consider carefully if the monitoring API 
indicates an
    >      >      error vs unavailable and an application should continue to 
run when the
    >      >      monitoring system fails to get the metrics.
    >      >
    >      >      There are several choices to report "something goes wrong" 
scenarios
    >      >      (should unlikely happen???):
    >      >      1. fall back to a random positive value  (e.g. host value)
    >      >      2. return a negative value
    >      >      3. throw an exception
    >      >
    >      >      #3 is not an option as the application is not expecting this. 
 For #2,
    >      >      the application can filter bad values if desirable.
    >      >
    >      >      I'm okay if you want to file a JBS issue to follow up and 
thoroughly
    >      >      look at the cases that the metrics are unavailable and the 
cases when
    >      >      fails to obtain.
    >      >
    >      >      >> ---
    >      >      >>
    >      >      >> 
test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
    >      >      >>
    >      >      >> System.out.println(String.format(...)
    >      >      >>
    >      >      >> Why not simply
    >      >      >>
    >      >      >> System.out.printf(..)
    >      >      >>
    >      >      >> ?
    >      >
    >      >      or simply (as I commented [1])
    >      >           System.out.format
    >      >
    >      >      Mandy
    >      >      [1]
    >      >      
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html
    >      >
    >      >
    >      >
    >      >
    >      
    >      
    >
    >
    
    


Reply via email to