Re: Low-Overhead Heap Profiling

Robbin Ehn Tue, 30 May 2017 05:34:28 -0700

Hi Jc,

On 05/22/2017 08:47 PM, JC Beyler wrote:

Dear all,


I have a new webrev up:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/


I liked this!

Two small things:

heapMonitoring.hpp
class HeapMonitoring should extend AllStatic

heapMonitoring.cpp
class MuxLocker should extend StackObj
But I think you should skip MuxLocker or push it separate generic enhancement.

Great with the jtreg test, thanks alot!


This webrev has, I hope, fixed a lot of the comments from Robbin:
   - The casts normally are all C++ style
   - Moved this to jdk10-hs
      - I have not tested slowdebug yet, hopefully it does not break there
   - Added the garbage collection system:
      - Now live sampled allocations are tracked throughout their lifetime
         - When GC happens, it moves the sampled allocation information to two 
lists: recent and frequent GC lists
             - Those lists use the array system that the live objects were 
using before but have different re-use strategies
      - Added the JVMTI API for them via a GetFrequentGarbageTraces and 
GetGarbageTraces
- Both use the same JVMTI structures
      - Added the calls to them for the test, though I've kept that test simple 
for now:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.03/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c

- As I write this, I notice my webrev is missing a final change I made to the test that calls the same ReleaseTraces to each live/garbage/frequent structure. Thisis updated in my local repo and will get in the next webrev.


Next steps for this work are:
    - Putting the TLAB implementation (yes not forgotten ;-))
    - Adding more testing and separate the current test system to check things 
a bit more thoroughly
    - Have not tried to circumvent AsyncGetCallTrace yet
    - Still have to double check the stack walker a bit more


Looking forward to this.

Could someone from compiler take a look please?

Thanks!

/Robbin


Happy webrev perusal!
Jc


On Tue, May 16, 2017 at 5:20 AM, Robbin Ehn <[email protected] 
<mailto:[email protected]>> wrote:

    Just a few answers,

    On 05/15/2017 06:48 PM, JC Beyler wrote:

        Dear all,

        I've updated the webrev to:
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ 
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/ 
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/>>


    I'll look at this later, thanks!


        Robbin,
        I believe I have addressed most of your items with webrev 02:
            - I added a JTreg test to show how it works:
        
http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.02/raw_files/new/test/serviceability/jvmti/HeapMonitor/libHeapMonitor.c>>
           - I've modified the code to use its own data structures both 
internally and externally, this will make it easier to move out of 
AsyncGetCallTrace as we move
        forward, that is still on my TODOs
           - I cleaned up the JVMTI API by passing a structure that handles the 
num_traces and put in a ReleaseTraces as well
           - I cleaned up other issues as well.

        However, I have three questions, which are probably because I'm new in 
this community:
           1) My previous webrevs were based off of JDK9 by mistake. When I 
took JDK10 via : hg clone http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10 
<http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
               - I don't see code compatible with what you were showing (ie 
your patches don't make sense for that code base; ex: klass is still accessed 
via klass() for
        example in collectedHeap.inline.hpp)
               - Would you know what is the right hg clone command so we are 
working on the same code base?


    We use jdk10-hs, e.g.
    hg tclone http://hg.openjdk.java.net/jdk10/hs 
<http://hg.openjdk.java.net/jdk10/hs> 10-hs

    There is sporadic big merges going from jdk9->jdk10->jdk10-hs and 
jdk10-hs->jdk10, so 10 is moving...


           2) You mentioned I was using os::malloc, new, NEW_C_HEAP_ARRAY; I 
cleaned out the os::malloc but which of the new vs NEW_C_HEAP_ARRAY should I 
use. It might be
        that I don't understand when one uses one or the other but I see both 
used around the code base?
             - Is it that new is to be used for anything internal and 
NEW_C_HEAP_ARRAY anything provided to the JVMTI users outside of the JVM?


    We overload new operator when you extend correct base class, e.g. 
CHeapObj<mtInternal> so use 'new'
    But for arrays you will need the macro NEW_C_HEAP_ARRAY.


           3) Casts: same kind question: which should I use. The code was using 
a bit of everything, I'll refactor it entirely but I was not clear if I should 
go to C casts
        or C++ casts as I see both in the codebase. What is the convention I 
should use?


    Just be consist, use what suites you, C++ casts might be preferable, if we 
are moving towards C++11.
    And use 'right' cast, e.g. going from Thread* to JavaThread* you should use 
C cast or static_cast, not reinterpret_cast I would say.


        Final notes on this webrev:
            - I am still missing:
              - Putting a TLAB implementation so that we can compare both 
webrevs
              - Have not tried to circumvent AsyncGetCallTrace
              - Putting in the handling of GC'd objects
              - Fix a stack walker issue I have seen, I think I know the 
problem and will test that theory out for the next webrev

        I will work on integrating those items for the next webrev!


    Thanks!


        Thanks for your help,
        Jc

        Ps:  I tested this on a new repo:

        hg clone http://hg.openjdk.java.net/jdk10/jdk10 
<http://hg.openjdk.java.net/jdk10/jdk10> <http://hg.openjdk.java.net/jdk10/jdk10
        <http://hg.openjdk.java.net/jdk10/jdk10>> jdk10
        ... building it
        cd test
        jtreg 
-nativepath:<path-to-jdk10>/build/linux-x86_64-normal-server-release/support/test/hotspot/jtreg/native/lib/
 -jdk
        <path-to-jdk10>/linux-x86_64-normal-server-release/images/jdk 
../hotspot/test/serviceability/jvmti/HeapMonitor/


    I'll test it out!

    /Robbin



        On Thu, May 4, 2017 at 11:21 PM, [email protected] 
<mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>> <[email protected] 
<mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

             Robbin,

             Thank you for forwarding!
             I will review it.

             Thanks,
             Serguei



             On 5/4/17 02:13, Robbin Ehn wrote:

                 Hi,

                 To me the compiler changes looks what is expected.
                 It would be good if someone from compiler could take a look at 
that.
                 Added compiler to mail thread.

                 Also adding Serguei, It would be good with his view also.

                 My initial take on it, read through most of the code and took 
it for a ride.

                 ##############################
                 - Regarding the compiler changes: I think we need the 'TLAB 
end' trickery (mentioned by Tony P)
                 instead of a separate check for sampling in fast path for the 
final version.

                 ##############################
                 - This patch I had to apply to get it compile on JDK 10:

                 diff -r ac3ded340b35 
src/share/vm/gc/shared/collectedHeap.inline.hpp
                 --- a/src/share/vm/gc/shared/collectedHeap.inline.hpp    Fri 
Apr 28 14:31:38 2017 +0200
                 +++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp    Thu 
May 04 10:22:56 2017 +0200
                 @@ -87,3 +87,3 @@
                       // support for object alloc event (no-op most of the 
time)
                 -    if (klass() != NULL && klass()->name() != NULL) {
                 +    if (klass != NULL && klass->name() != NULL) {
                         Thread *base_thread = Thread::current();
                 diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 
14:31:38 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 
10:22:56 2017 +0200
                 @@ -316,3 +316,3 @@
                     JavaThread *thread = reinterpret_cast<JavaThread 
*>(Thread::current());
                 -  assert(o->size() << LogHeapWordSize == byte_size,
                 +  assert(o->size() << LogHeapWordSize == (long)byte_size,
                            "Object size is incorrect.");

                 ##############################
                 - This patch I had to apply to get it not asserting during 
slowdebug:

                 --- a/src/share/vm/runtime/heapMonitoring.cpp    Fri Apr 28 
15:15:16 2017 +0200
                 +++ b/src/share/vm/runtime/heapMonitoring.cpp    Thu May 04 
10:24:25 2017 +0200
                 @@ -32,3 +32,3 @@
                   // TODO(jcbeyler): should we make this into a JVMTI 
structure?
                 -struct StackTraceData {
                 +struct StackTraceData : CHeapObj<mtInternal> {
                     ASGCT_CallTrace *trace;
                 @@ -143,3 +143,2 @@
                   StackTraceStorage::StackTraceStorage() :
                 -    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
                       _allocated_traces_size(MaxHeapTraces),
                 @@ -147,2 +146,3 @@
                       _allocated_count(0) {
                 +  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, 
MaxHeapTraces, mtInternal);
                     memset(_allocated_traces, 0, sizeof(*_allocated_traces) * 
MaxHeapTraces);
                 @@ -152,3 +152,3 @@
                   StackTraceStorage::~StackTraceStorage() {
                 -  delete[] _allocated_traces;
                 +  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
                   }

                 - Classes should extend correct base class for which type of memory 
is used for it e.g.: CHeapObj<mt????> or StackObj or AllStatic
                 - The style in heapMonitoring.cpp is a bit different from 
normal vm-style, e.g. using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, 
os::malloc and new.
                 - In jvmtiHeapTransition.hpp you use C cast instead.

                 ##############################
                 - This patch I had apply to get traces without setting an 
‘unrelated’ capability
                 - Should this not be a new capability?

                 diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
                 --- a/src/share/vm/prims/forte.cpp    Fri Apr 28 15:15:16 2017 
+0200
                 +++ b/src/share/vm/prims/forte.cpp    Thu May 04 10:24:25 2017 
+0200
                 @@ -530,6 +530,6 @@

                 -  if (!JvmtiExport::should_post_class_load()) {
                 +/*  if (!JvmtiExport::should_post_class_load()) {
                       trace->num_frames = ticks_no_class_load; // -1
                       return;
                 -  }
                 +  }*/

                 ##############################
                 - forte.cpp: (I know this is not part of your changes but)
                 find_jmethod_id_or_null give me NULL for my test.
                 It looks like we actually want the regular jmethod_id() ?

                 Since we are the thread we are talking about (and in same 
ucontext) and thread is in vm and have a last java frame,
                 I think most of the checks done in AsyncGetCallTrace is 
irrelevant, so you should be-able to call forte_fill_call_trace_given_top 
directly.
                 But since we might need jmethod_id() if possible to avoid 
getting method id NULL,
                 we need some fixes in forte code, or just do the vframStream 
loop inside heapMonitoring.cpp and not use forte.cpp.

                 Something like:

                    if (jthread->has_last_Java_frame()) { // just to be safe
                      vframeStream vfst(jthread);
                      while (!vfst.at_end()) {
                        Method* m = vfst.method();
                        m->jmethod_id();
                        m->line_number_from_bci(vfst.bci());
                        vfst.next();
                      }

                 - This is a bit confusing in forte.cpp, 
trace->frames[count].lineno = bci.
                 Line number should be m->line_number_from_bci(bci);
                 Do the heapMonitoring suppose to trace with bci or line number?
                 I would say bci, meaning we should either rename 
ASGCT_CallFrame→lineno or use another data structure which says bci.

                 ##############################
                 - // TODO(jcbeyler): remove this extra code handling the extra 
trace for
                 Please fix all these TODO's :)

                 ##############################
                 - heapMonitoring.hpp:
                 // TODO(jcbeyler): is this algorithm acceptable in open source?

                 Why is this comment here? What is the implication?
                 Have you tested any simpler algorithm?

                 ##############################
                 - Create a sanity jtreg test. 
(./hotspot/make/test/JtregNative.gmk for building the agent)

                 ##############################
                 - monitoring_period vs HeapMonitorRate, pick rate or period.

                 ##############################
                 - globals.hpp
                 Why is MaxHeapTraces not settable/overridable from jvmti 
interface? That would be handy.

                 ##############################
                 - jvmtiStackTraceData + ASGCT_CallFrame memory
                 Are the agent suppose to loop through and free all 
ASGCT_CallFrame?
                 Wouldn't it be better with some kinda protocol, like:
                 (*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
                 (*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

                 Also using another data structure that have num_traces inside 
it simplifies things.
                 So I'm not convinced using the async structure is the best way 
forward.


                 I have more questions, but I think it's better if you respond 
and update the code first.

                 Thanks!

                 /Robbin


                 On 04/21/2017 11:34 PM, JC Beyler wrote:

                     Hi all,

                     I've added size information to the allocation sampling 
system. This allows the callback to remember the size of each sampled 
allocation.
        http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ 
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>
        <http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/ 
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/>>

                     The new webrev.01 also adds the actual heap monitoring 
sampling system in files:
        
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>
                     
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch>>
                     and
        
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>
                     
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch
        
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch>>

                     My next step is to add the GC part to the webrev, which 
will allow users to determine what objects are live and what are garbage.

                     Thanks for your attention and let me know if there are any 
questions!

                     Have a wonderful Friday!
                     Jc

                     On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>>> wrote:

                          Hi all,

                          I worked on getting a few numbers for overhead and 
accuracy for my feature. I'm unsure if here is the right place to provide the 
full data, so I
        am just
                     summarizing
                          here for now.

                          - Overhead of the feature

                          Using the Dacapo benchmark (http://dacapobench.org/). 
My initial results are that sampling provides 2.4% with a 512k sampling, 512k 
being our
        default setting.

                          - Note: this was without the tradesoap, tradebeans 
and tomcat benchmarks since they did not work with my JDK9 (issue between 
Dacapo and JDK9 it seems)
                          - I want to rerun next week to ensure number stability

                          - Accuracy of the feature

                          I wrote a small microbenchmark that allocates from 
two different stacktraces at a given ratio. For example, 10% of stacktrace S1 
and 90% from
        stacktrace
                     S2. The
                          microbenchmark was run 20 times, I averaged the 
results and looked for accuracy. It seems that statistically it is sound since 
if I allocated10%
        S1 and 90%
                     S2, with a
                          sampling rate of 512k, I obtained 9.61% S1 and 90.49% 
S2.

                          Let me know if there are any questions on the numbers 
and if you'd like to see some more data.

                          Note: this was done using our internal JDK8 
implementation since the webrev provided by
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>> does not yet 
contain the whole
                     implementation and therefore would have been misleading.

                          Thanks,
                          Jc


                          On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <[email protected] 
<mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>> wrote:

                              Hi all,

                              To move the discussion forward, with Chuck 
Rasbold's help to make a webrev, we pushed this:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>
                     <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>>>
                              415 lines changed: 399 ins; 13 del; 3 mod; 51122 
unchg

                              This is not a final change that does the whole 
proposition from the JBS entry: https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 
<https://bugs.openjdk.java.net/browse/JDK-8177374>>
                     <https://bugs.openjdk.java.net/browse/JDK-8177374 
<https://bugs.openjdk.java.net/browse/JDK-8177374> 
<https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>>>; what it does show 
is parts of the implementation that is
                     proposed and hopefully can start the conversation going
                              as I work through the details.

                              For example, the changes to C2 are done here for 
the allocations:
        
http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>
                     
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>
                     
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>>>

                              Hopefully this all makes sense and thank you for 
all your future comments!
                              Jc


                              On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler 
<[email protected] <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>>
                     wrote:

                                  Hello all,

                                  This is a follow-up from Jeremy's initial 
email from last year:
        
http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>
                     
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>
                     
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html> 
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html
        
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>>>

                                  I've gone ahead and started working on 
preparing this and Jeremy and I went down the route of actually writing it up 
in JEP form:
        https://bugs.openjdk.java.net/browse/JDK-8171119 
<https://bugs.openjdk.java.net/browse/JDK-8171119> 
<https://bugs.openjdk.java.net/browse/JDK-8171119
        <https://bugs.openjdk.java.net/browse/JDK-8171119>>

                                  I think original conversation that happened 
last year in that thread still holds true:

                                    - We have a patch at Google that we think 
others might be interested in
                                       - It provides a means to understand 
where the allocation hotspots are at a very low overhead
                                       - Since it is at a low overhead, we can 
leave it on by default

                                  So I come to the mailing list with Jeremy's 
initial question:
                                  "I thought I would ask if there is any interest / 
if I should write a JEP / if I should just forget it."

                                  A year ago, it seemed some thought it was a 
good idea, is this still true?

                                  Thanks,
                                  Jc

Re: Low-Overhead Heap Profiling

Reply via email to