Re: Low-Overhead Heap Profiling

Robbin Ehn Wed, 25 Oct 2017 00:32:09 -0700

Hi,

325     HeapWord *tlab_old_end = thread->tlab().return end();


Should be something like:

325     HeapWord *tlab_old_end = thread->tlab().end();

Thanks, Robbin

On 2017-10-23 17:27, JC Beyler wrote:

Dear all,

Small update this week with this new webrev:
   - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
   - Incremental is here: 
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/

I patched the code changes showed by Robbin last week and I refactoredcollectedHeap.cpp:

http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch

The original code became a bit too complex in my opinion with thehandle_heap_sampling handling too many things. So I subdivided the logic intotwo smaller methods and moved out a bit of the logic to make it more clear.Hopefully it is :)


Let me know if you have any questions/comments :)
Jc

On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler <[email protected]<mailto:[email protected]>> wrote:


    Hi Robbin,

    That is because version 11 to 12 was only a test change. I was going to
    write about it and say here are the webrev links:
    Incremental:
    http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
    <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/>

    Full webrev:
    http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/
    <http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/>

    This change focused only on refactoring the tests to be more manageable,
    readable, maintainable. As all tests are looking at allocations, I moved
    common code to a java class:
    
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch
    
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch>

    And then most tests call into that class to turn on/off the sampling,
    allocate, etc. This has removed almost 500 lines of test code so I'm happy
    about that.

    Thanks for your changes, a bit of relics of previous versions :). I've
    already integrated them into my code and will make a new webrev end of this
    week with a bit of refactor of the code handling the tlab slow path. I find
    it could use a bit of refactoring to make it easier to follow so I'm going
    to take a stab at it this week.

    Any other issues/comments?

    Thanks!
    Jc


    On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <[email protected]
    <mailto:[email protected]>> wrote:

        Hi JC,

        I saw a webrev.12 in the directory, with only test changes(11->12), so I
        took that version.
        I had a look and tested the tests, worked fine!

        First glance at the code (looking at full v12) some minor things below,
        mostly unused stuff.

        Thanks, Robbin

        diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
        --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
        16:54:06 2017 +0200
        +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
        17:42:42 2017 +0200
        @@ -211,2 +211,3 @@
            void initialize(int max_storage) {
        +    // validate max_storage to sane value ? What would 0 mean ?
              MutexLocker mu(HeapMonitor_lock);
        @@ -227,8 +228,4 @@
            bool initialized() { return _initialized; }
        -  volatile bool *initialized_address() { return &_initialized; }

           private:
        -  // Protects the traces currently sampled (below).
        -  volatile intptr_t _stack_storage_lock[1];
        -
            // The traces currently sampled.
        @@ -313,3 +310,2 @@
            _initialized(false) {
        -    _stack_storage_lock[0] = 0;
          }
        @@ -532,13 +528,2 @@

        -// Delegate the initialization question to the underlying storage 
system.
        -bool HeapMonitoring::initialized() {
        -  return StackTraceStorage::storage()->initialized();
        -}
        -
        -// Delegate the initialization question to the underlying storage 
system.
        -bool *HeapMonitoring::initialized_address() {
        -  return

-const_cast<bool*>(StackTraceStorage::storage()->initialized_address());

        -}
        -
          void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
        diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
        --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
        16:54:06 2017 +0200
        +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
        17:42:42 2017 +0200
        @@ -35,3 +35,2 @@
            static uint64_t _rnd;
        -  static bool _initialized;
            static jint _monitoring_rate;
        @@ -92,7 +91,2 @@

        -  // Is the profiler initialized and where is the address to the
        initialized
        -  // boolean.
        -  static bool initialized();
        -  static bool *initialized_address();
        -
            // Called when o is to be sampled from a given thread and a given 
size.



        On 10/10/2017 12:57 AM, JC Beyler wrote:

            Dear all,

            Thread-safety is back!! Here is the update webrev:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/>

            Full webrev is here:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/>

            In order to really test this, I needed to add this so thought now
            was a good time. It required a few changes here for the creation to
            ensure correctness and safety. Now we keep the static pointer but
            clear the data internally so on re-initialize, it will be a bit more
            costly than before. I don't think this is a huge use-case so I did
            not think it was a problem. I used the internal MutexLocker, I think
            I used it well, let me know.

            I also added three tests:

            1) Stack depth test:
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch>

            This test shows that the maximum stack depth system is working.

            2) Thread safety:
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch>

            The test creates 24 threads and they all allocate at the same time.
            The test then checks it does find samples from all the threads.

            3) Thread on/off safety
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch>

            The test creates 24 threads that all allocate a bunch of memory.
            Then another thread turns the sampling on/off.

            Btw, both tests 2 & 3 failed without the locks.

            As I worked on this, I saw a lot of places where the tests are doing
            very similar things, I'm going to clean up the code a bit and make a
            HeapAllocator class that all tests can call directly. This will
            greatly simplify the code.

            Thanks for any comments/criticisms!
            Jc


            On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>> wrote:

                 Dear all,

                 Small update to the webrev:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>>

                 Full webrev is here:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>>

                 I updated a bit of the naming, removed a TODO comment, and I
            added a test for testing the sampling rate. I also updated the
            maximum stack depth to 1024, there is no
                 reason to keep it so small. I did a micro benchmark that tests
            the overhead and it seems relatively the same.

                 I compared allocations from a stack depth of 10 and allocations
            from a stack depth of 1024 (allocations are from the same helper
            method in
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>>):
                            - For an array of 1 integer allocated in a loop;
            stack depth 1024 vs stack depth 10: 1% slower
                            - For an array of 200k integers allocated in a loop;
            stack depth 1024 vs stack depth 10: 3% slower

                 So basically now moving the maximum stack depth to 1024 but we
            only copy over the stack depths actually used.

                 For the next webrev, I will be adding a stack depth test to
            show that it works and probably put back the mutex locking so that
            we can see how difficult it is to keep
                 thread safe.

                 Let me know what you think!
                 Jc



                 On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>> wrote:

                     Forgot to say that for my numbers:
                       - Not in the test are the actual numbers I got for the
            various array sizes, I ran the program 30 times and parsed the
            output; here are the averages and standard
                     deviation:
                            1000:     1.28% average; 1.13% standard deviation
                            10000:    1.59% average; 1.25% standard deviation
                            100000:   1.26% average; 1.26% standard deviation

                     The 1000/10000/100000 are the sizes of the arrays being
            allocated. These are allocated 100k times and the sampling rate is
            111 times the size of the array.

                     Thanks!
                     Jc


                     On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                         Hi all,

                         After a bit of a break, I am back working on this :).
            As before, here are two webrevs:

                         - Full change set:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>>
                         - Compared to version 8:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>>
                              (This version is compared to version 8 I last
            showed but ported to the new folder hierarchy)

                         In this version I have:
                            - Handled Thomas' comments from his email of 07/03:
                                 - Merged the logging to be standard
                                 - Fixed up the code a bit where asked
                                 - Added some notes about the code not being
            thread-safe yet
                             - Removed additional dead code from the version
            that modifies interpreter/c1/c2
                             - Fixed compiler issues so that it compiles with
            --disable-precompiled-header
                                  - Tested with ./configure
            --with-boot-jdk=<jdk8> --with-debug-level=slowdebug
            --disable-precompiled-headers

                         Additionally, I added a test to check the sanity of the
            sampler: HeapMonitorStatCorrectnessTest

(http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>>)
                             - This allocates a number of arrays and checks that
            we obtain the number of samples we want with an accepted error of
            5%. I tested it 100 times and it
                         passed everytime, I can test more if wanted
                             - Not in the test are the actual numbers I got for
            the various array sizes, I ran the program 30 times and parsed the
            output; here are the averages and
                         standard deviation:
                                1000:     1.28% average; 1.13% standard 
deviation
                                10000:    1.59% average; 1.25% standard 
deviation
                                100000:   1.26% average; 1.26% standard 
deviation

                         What this means is that we were always at about 1~2% of
            the number of samples the test expected.

                         Let me know what you think,
                         Jc

                         On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                             Hi all,

                             I apologize, I have not yet handled your remarks
            but thought this new webrev would also be useful to see and comment
            on perhaps.

                             Here is the latest webrev, it is generated slightly
            different than the others since now I'm using webrev.ksh without the
            -N option:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>>

                             And the webrev.07 to webrev.08 diff is here:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>>

                             (Let me know if it works well)

                             It's a small change between versions but it:
                                - provides a fix that makes the average sample
            rate correct (more on that below).
                                - fixes the code to actually have it play nicely
            with the fast tlab refill
                                - cleaned up a bit the JVMTI text and now use
            jvmtiFrameInfo
                             - moved the capability to be onload solo

                             With this webrev, I've done a small study of the
            random number generator we use here for the sampling rate. I took a
            small program and it can be simplified to:

                             for (outer loop)
                             for (inner loop)
                             int[] tmp = new int[arraySize];

                             - I've fixed the outer and inner loops to being 800
            for this experiment, meaning we allocate 640000 times an array of a
            given array size.

                             - Each program provides the average sample size
            used for the whole execution

                             - Then, I ran each variation 30 times and then
            calculated the average of the average sample size used for various
            array sizes. I selected the array size to
                             be one of the following: 1, 10, 100, 1000.

                             - When compared to 512kb, the average sample size
            of 30 runs:
                             1: 4.62% of error
                             10: 3.09% of error
                             100: 0.36% of error
                             1000: 0.1% of error
                             10000: 0.03% of error

                             What it shows is that, depending on the number of
            samples, the average does become better. This is because with an
            allocation of 1 element per array, it
                             will take longer to hit one of the thresholds. This
            is seen by looking at the sample count statistic I put in. For the
            same number of iterations (800 *
                             800), the different array sizes provoke:
                             1: 62 samples
                             10: 125 samples
                             100: 788 samples
                             1000: 6166 samples
                             10000: 57721 samples

                             And of course, the more samples you have, the more
            sample rates you pick, which means that your average gets closer
            using that math.

                             Thanks,
                             Jc

                             On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                                 Thanks Robbin,

                                 This seems to have worked. When I have the next
            webrev ready, we will find out but I'm fairly confident it will 
work!

                                 Thanks agian!
                                 Jc

                                 On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> 
wrote:

                                     Hi JC,

                                     On 06/29/2017 12:15 AM, JC Beyler wrote:

                                         B) Incremental changes


                                     I guess the most common work flow here is
            using mq :
                                     hg qnew fix_v1
                                     edit files
                                     hg qrefresh
                                     hg qnew fix_v2
                                     edit files
                                     hg qrefresh

                                     if you do hg log you will see 2 commits

                                     webrev.ksh -r -2 -o my_inc_v1_v2
                                     webrev.ksh -o my_full_v2


                                     In  your .hgrc you might need:
                                     [extensions]
                                     mq =

                                     /Robbin


                                         Again another newbiew question here...

                                         For showing the incremental changes, is
            there a link that explains how to do that? I apologize for my newbie
            questions all the time :)

                                         Right now, I do:

                                            ksh ../webrev.ksh -m -N

                                         That generates a webrev.zip and send it
            to Chuck Rasbold. He then uploads it to a new webrev.

                                         I tried commiting my change and adding
            a small change. Then if I just do ksh ../webrev.ksh without any
            options, it seems to produce a similar
                                         page but now with only the changes I
            had (so the 06-07 comparison you were talking about) and a changeset
            that has it all. I imagine that is
                                         what you meant.

                                         Which means that my workflow would 
become:

                                         1) Make changes
                                         2) Make a webrev without any options to
            show just the differences with the tip
                                         3) Amend my changes to my local commit
            so that I have it done with
                                         4) Go to 1

                                         Does that seem correct to you?

                                         Note that when I do this, I only see
            the full change of a file in the full change set (Side note here:
            now the page says change set and not
                                         patch, which is maybe why Serguei was
            having issues?).

                                         Thanks!
                                         Jc



                                         On Wed, Jun 28, 2017 at 1:12 AM, Robbin
            Ehn <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
            <mailto:[email protected] <mailto:[email protected]>
                                         <mailto:[email protected]
            <mailto:[email protected]>>>> wrote:

                                              Hi,

                                              On 06/28/2017 12:04 AM, JC Beyler
            wrote:

                                                  Dear Thomas et al,

                                                  Here is the newest webrev:
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>>



                                              You have some more bits to in
            there but generally this looks good and really nice with more tests.
                                              I'll do and deep dive and re-test
            this when I get back from my long vacation with whatever patch
            version you have then.

                                              Also I think it's time you provide
            incremental (v06->07 changes) as well as complete change-sets.

                                              Thanks, Robbin




                                                  Thomas, I "think" I have
            answered all your remarks. The summary is:

                                                  - The statistic system is up
            and provides insight on what the heap sampler is doing
                                                       - I've noticed that,
            though the sampling rate is at the right mean, we are missing some
            samples, I have not yet tracked out why
                                         (details below)

                                                  - I've run a tiny benchmark
            that is the worse case: it is a very tight loop and allocated a
            small array
                                                       - In this case, I see no
            overhead when the system is off so that is a good start :)
                                                       - I see right now a high
            overhead in this case when sampling is on. This is not a really too
            surprising but I'm going to see if
                                         this is consistent with our
                                                  internal implementation. The
            benchmark is really allocation stressful so I'm not too surprised
            but I want to do the due diligence.

                                                     - The statistic system up
            is up and I have a new test
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>>
                                                        - I did a bit of a study
            about the random generator here, more details are below but
            basically it seems to work well

                                                     - I added a capability but
            since this is the first time doing this, I was not sure I did it 
right
                                                       - I did add a test though
            for it and the test seems to do what I expect (all methods are
            failing with the
                                         JVMTI_ERROR_MUST_POSSESS_CAPABILITY 
error).
                                                           -
            
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch

            
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>>

                                                     - I still need to figure
            out what to do about the multi-agent vs single-agent issue

                                                     - As far as measurements,
            it seems I still need to look at:
                                                       - Why we do the 20 random
            calls first, are they necessary?
                                                       - Look at the mean of the
            sampling rate that the random generator does and also what is
            actually sampled
                                                       - What is the overhead in
            terms of memory/performance when on?

                                                  I have inlined my answers, I
            think I got them all in the new webrev, let me know your thoughts.

                                                  Thanks again!
                                                  Jc


                                                  On Fri, Jun 23, 2017 at 3:52
            AM, Thomas Schatzl <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
                                         <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
            <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] 
<mailto:[email protected]>>

<mailto:[email protected] <mailto:[email protected]>

            <mailto:[email protected]
            <mailto:[email protected]>>>>> wrote:

                                                       Hi,

                                                       On Wed, 2017-06-21 at
            13:45 -0700, JC Beyler wrote:
                                                       > Hi all,
                                                       >
                                                       > First off: Thanks again
            to Robbin and Thomas for their reviews :)
                                                       >
                                                       > Next, I've uploaded a
            new webrev:
                                                       >
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>>

                                                       >
                                                       > Here is an update:
                                                       >
                                                       > - @Robbin, I forgot to
            say that yes I need to look at implementing
                                                       > this for the other
            architectures and testing it before it is all
                                                       > ready to go. Is it
            common to have it working on all possible
                                                       > combinations or is
            there a subset that I should be doing first and we
                                                       > can do the others 
later?
                                                       > - I've tested
            slowdebug, built and ran the JTreg tests I wrote with
                                                       > slowdebug and fixed a
            few more issues
                                                       > - I've refactored a bit
            of the code following Thomas' comments
                                                       >    - I think I've
            handled all the comments from Thomas (I put
                                                       > comments inline below
            for the specifics)

                                                       Thanks for handling all
            those.

                                                       > - Following Thomas'
            comments on statistics, I want to add some
                                                       > quality assurance tests
            and find that the easiest way would be to
                                                       > have a few counters of
            what is happening in the sampler and expose
                                                       > that to the user.
                                                       >    - I'll be adding
            that in the next version if no one sees any
                                                       > objections to that.
                                                       >    - This will allow me
            to add a sanity test in JTreg about number of
                                                       > samples and average of
            sampling rate
                                                       >
                                                       > @Thomas: I had a few
            questions that I inlined below but I will
                                                       > summarize the "bigger
            ones" here:
                                                       >    - You mentioned
            constants are not using the right conventions, I
                                                       > looked around and
            didn't see any convention except normal naming then
                                                       > for static constants.
            Is that right?

                                                       I looked through
            https://wiki.openjdk.java.net/display/HotSpot/StyleGui
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>

<https://wiki.openjdk.java.net/display/HotSpot/StyleGui

            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>

<https://wiki.openjdk.java.net/display/HotSpot/StyleGui

            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>

<https://wiki.openjdk.java.net/display/HotSpot/StyleGui

            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
            <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>>
                                                       de and the rule is to
            "follow an existing pattern and must have a
                                                       distinct appearance from
            other names". Which does not help a lot I
                                                       guess :/ The GC team
            started using upper camel case, e.g.
                                                       SomeOtherConstant, but
            very likely this is probably not applied
                                                       consistently throughout.
            So I am fine with not adding another style
                                                       (like kMaxStackDepth with
            the "k" in front with some unknown meaning)
                                                       is fine.

                                                       (Chances are you will
            find that style somewhere used anyway too,
                                                       apologies if so :/)


                                                  Thanks for that link, now I
            know where to look. I used the upper camel case in my code as well
            then :) I should have gotten them all.


                                                        > PS: I've also inlined
            my answers to Thomas below:
                                                        >
                                                        > On Tue, Jun 13, 2017
            at 8:03 AM, Thomas Schatzl <thomas.schatzl@oracl
                                                        > e.com <http://e.com>
            <http://e.com> <http://e.com> <http://e.com>> wrote:
                                                        > > Hi all,
                                                        > >
                                                        > > On Mon, 2017-06-12
            at 11:11 -0700, JC Beyler wrote:
                                                        > > > Dear all,
                                                        > > >
                                                        > > > I've continued
            working on this and have done the following
                                                        > > webrev:
                                                        > > >
            http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>

<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/

            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
            <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>>

                                                        > >
                                                        > > [...]
                                                        > > > Things I still
            need to do:
                                                        > > >    - Have to fix
            that TLAB case for the FastTLABRefill
                                                        > > >    - Have to start
            looking at the data to see that it is
                                                        > > consistent and does
            gather the right samples, right frequency, etc.
                                                        > > >    - Have to check
            the GC elements and what that produces
                                                        > > >    - Run a
            slowdebug run and ensure I fixed all those issues you
                                                        > > saw > Robbin
                                                        > > >
                                                        > > > Thanks for looking
            at the webrev and have a great week!
                                                        > >
                                                        > >   scratching a bit
            on the surface of this change, so apologies for
                                                        > > rather shallow 
comments:
                                                        > >
                                                        > > -
            macroAssembler_x86.cpp:5604: while this is compiler code, and I
                                                        > > am not sure this is
            final, please avoid littering the code with
                                                        > > TODO remarks :) They
            tend to be candidates for later wtf moments
                                                        > > only.
                                                        > >
                                                        > > Just file a CR for 
that.
                                                        > >
                                                        > Newcomer question:
            what is a CR and not sure I have the rights to do
                                                        > that yet ? :)

                                                       Apologies. CR is a change
            request, this suggests to file a bug in the
                                                       bug tracker. And you are
            right, you can't just create a new account in
                                                       the OpenJDK JIRA
            yourselves. :(


                                                  Ok good to know, I'll continue
            with my own todo list but I'll work hard on not letting it slip in
            the webrevs anymore :)


                                                       I was mostly referring to
            the "... but it is a TODO" part of that
                                                       comment in
            macroassembler_x86.cpp. Comments about the why of the code
                                                       are appreciated.

                                                       [Note that I now
            understand that this is to some degree still work in
                                                       progress. As long as the
            final changeset does no contain TODO's I am
                                                       fine (and it's not a hard
            objection, rather their use in "final" code
                                                       is typically limited in
            my experience)]

                                                       5603   // Currently, if
            this happens, just set back the actual end to
                                                       where it was.
                                                       5604   // We miss a
            chance to sample here.

                                                       Would be okay, if
            explaining "this" and the "why" of missing a chance
                                                       to sample here would be 
best.

                                                       Like maybe:

                                                       // If we needed to refill
            TLABs, just set the actual end point to
                                                       // the end of the TLAB
            again. We do not sample here although we could.

                                                  Done with your comment, it
            works well in my mind.

                                                       I am not sure whether
            "miss a chance to sample" meant "we could, but
                                                       consciously don't because
            it's not that useful" or "it would be
                                                       necessary but don't
            because it's too complicated to do.".

                                                       Looking at the original
            comment once more, I am also not sure if that
                                                       comment shouldn't
            referring to the "end" variable (not actual_end)
                                                       because that's the
            variable that is responsible for taking the sampling
                                                       path? (Going from the
            member description of ThreadLocalAllocBuffer).


                                                  I've moved this code and it no
            longer shows up here but the rationale and answer was:

                                                  So.. Yes, end is the variable
            provoking the sampling. Actual end is the actual end of the TLAB.

                                                  What was happening here is
            that the code is resetting _end to point towards the end of the new
            TLAB. Because, we now have the end for
                                         sampling and _actual_end for
                                                  the actual end, we need to
            update the actual_end as well.

                                                  Normally, were we to do the
            real work here, we would calculate the (end - start) offset, then 
do:

                                                  - Set the new end to : start +
            (old_end - old_start)
                                                  - Set the actual end like we
            do here now where it because it is the actual end.

                                                  Why is this not done here now
            anymore?
                                                      - I was still debating
            which path to take:
                                                         - Do it in the fast
            refill code, it has its perks:
                                                             - In a world where
            fast refills are happening all the time or a lot, we can augment
            there the code to do the sampling
                                                         - Remember what we had
            as an end before leaving the slowpath and check on return
                                                             - This is what I'm
            doing now, it removes the need to go fix up all fast refill paths
            but if you remain in fast refill paths,
                                         you won't get sampling. I
                                                  have to think of the
            consequences of that, maybe a future change later on?
                                                                - I have the
            statistics now so I'm going to study that
                                                                   -> By the
            way, though my statistics are showing I'm missing some samples, if I
            turn off FastTlabRefill, it is the same
                                         loss so for now, it seems
                                                  this does not occur in my
            simple test.



                                                       But maybe I am only
            confused and it's best to just leave the comment
                                                       away. :)

                                                       Thinking about it some
            more, doesn't this not-sampling in this case
                                                       mean that sampling does
            not work in any collector that does inline TLAB
                                                       allocation at the moment?
            (Or is inline TLAB alloc automatically
                                                       disabled with sampling
            somehow?)

                                                       That would indeed be a
            bigger TODO then :)


                                                  Agreed, this remark made me
            think that perhaps as a first step the new way of doing it is better
            but I did have to:
                                                     - Remove the const of the
            ThreadLocalBuffer remaining and hard_end methods
                                                     - Move hard_end out of the
            header file to have a bit more logic there

                                                  Please let me know what you
            think of that and if you prefer it this way or changing the fast
            refills. (I prefer this way now because it
                                         is more incremental).


                                                       > > - calling
            HeapMonitoring::do_weak_oops() (which should probably be
                                                       > > called weak_oops_do()
            like other similar methods) only if string
                                                       > > deduplication is
            enabled (in g1CollectedHeap.cpp:4511) seems wrong.
                                                       >
                                                       > The call should be at
            least around 6 lines up outside the if.
                                                       >
                                                       > Preferentially in a
            method like process_weak_jni_handles(), including
                                                       > additional logging. (No
            new (G1) gc phase without minimal logging
                                                       > :)).
                                                       > Done but really not
            sure because:
                                                       >
                                                       > I put for logging:
                                                       >   log_develop_trace(gc,
            freelist)("G1ConcRegionFreeing [other] : heap
                                                       > monitoring");

                                                       I would think that "gc,
            ref" would be more appropriate log tags for
                                                       this similar to jni 
handles.
                                                       (I am als not sure what
            weak reference handling has to do with
                                                       G1ConcRegionFreeing, so I
            am a bit puzzled)


                                                  I was not sure what to put for
            the tags or really as the message. I cleaned it up a bit now to:
                                                       log_develop_trace(gc,
            ref)("HeapSampling [other] : heap monitoring processing");



                                                       > Since weak_jni_handles
            didn't have logging for me to be inspired
                                                       > from, I did that but
            unconvinced this is what should be done.

                                                       The JNI handle processing
            does have logging, but only in

ReferenceProcessor::process_discovered_references(). In process_weak_jni_handles() only overall time is measured (in a G1

                                                       specific way, since only
            G1 supports disabling reference procesing) :/

                                                       The code in
            ReferenceProcessor prints both time taken

referenceProcessor.cpp:254, as well as the count, but strangely

            only in
                                                       debug VMs.

                                                       I have no idea why this
            logging is that unimportant to only print that
                                                       in a debug VM. However
            there are reviews out for changing this area a
                                                       bit, so it might be
            useful to wait for that (JDK-8173335).


                                                  I cleaned it up a bit anyway
            and now it returns the count of objects that are in the system.


                                                       > > - the change doubles
            the size of
                                                       > >
            CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
                                                       > > threshold. Maybe it
            could be refactored a bit.
                                                       > Done I think, it looks
            better to me :).

                                                       In
            ThreadLocalAllocBuffer::handle_sample() I think the

set_back_actual_end()/pick_next_sample() calls could be hoisted out of

                                                       the "if" :)


                                                  Done!


                                                       > > -
            referenceProcessor.cpp:261: the change should add logging about
                                                       > > the number of
            references encountered, maybe after the corresponding
                                                       > > "JNI weak reference
            count" log message.
                                                       > Just to double check,
            are you saying that you'd like to have the heap
                                                       > sampler to keep in
            store how many sampled objects were encountered in
                                                       > the
            HeapMonitoring::weak_oops_do?
                                                       >    - Would a return of
            the method with the number of handled
                                                       > references and logging
            that work?

                                                       Yes, it's fine if
            HeapMonitoring::weak_oops_do() only returned the
                                                       number of processed weak
            oops.


                                                  Done also (but I admit I have
            not tested the output yet) :)


                                                       >    - Additionally,
            would you prefer it in a separate block with its
                                                       > GCTraceTime?

                                                       Yes. Both kinds of
            information is interesting: while the time taken is
                                                       typically more important,
            the next question would be why, and the
                                                       number of references
            typically goes a long way there.

                                                       See above though, it is
            probably best to wait a bit.


                                                  Agreed that I "could" wait
            but, if it's ok, I'll just refactor/remove this when we get closer
            to something final. Either, JDK-8173335
                                                  has gone in and I will notice
            it now or it will soon and I can change it then.


                                                       > > -
            threadLocalAllocBuffer.cpp:331: one more "TODO"
                                                       > Removed it and added it
            to my personal todos to look at.
                                                       >      > >
                                                       > > -
            threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
                                                       > > documentation should
            be updated about the sampling additions. I
                                                       > > would have no clue
            what the difference between "actual_end" and
                                                       > > "end" would be from
            the given information.
                                                       > If you are talking
            about the comments in this file, I made them more
                                                       > clear I hope in the new
            webrev. If it was somewhere else, let me know
                                                       > where to change.

Re: Low-Overhead Heap Profiling

Reply via email to