Re: Low-Overhead Heap Profiling

JC Beyler Mon, 16 Oct 2017 09:35:12 -0700

Hi Robbin,

That is because version 11 to 12 was only a test change. I was going to
write about it and say here are the webrev links:
Incremental:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/


Full webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/

This change focused only on refactoring the tests to be more manageable,
readable, maintainable. As all tests are looking at allocations, I moved
common code to a java class:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch

And then most tests call into that class to turn on/off the sampling,
allocate, etc. This has removed almost 500 lines of test code so I'm happy
about that.

Thanks for your changes, a bit of relics of previous versions :). I've
already integrated them into my code and will make a new webrev end of this
week with a bit of refactor of the code handling the tlab slow path. I find
it could use a bit of refactoring to make it easier to follow so I'm going
to take a stab at it this week.

Any other issues/comments?

Thanks!
Jc


On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <[email protected]> wrote:

> Hi JC,
>
> I saw a webrev.12 in the directory, with only test changes(11->12), so I
> took that version.
> I had a look and tested the tests, worked fine!
>
> First glance at the code (looking at full v12) some minor things below,
> mostly unused stuff.
>
> Thanks, Robbin
>
> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
> --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
> 16:54:06 2017 +0200
> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
> 17:42:42 2017 +0200
> @@ -211,2 +211,3 @@
>    void initialize(int max_storage) {
> +    // validate max_storage to sane value ? What would 0 mean ?
>      MutexLocker mu(HeapMonitor_lock);
> @@ -227,8 +228,4 @@
>    bool initialized() { return _initialized; }
> -  volatile bool *initialized_address() { return &_initialized; }
>
>   private:
> -  // Protects the traces currently sampled (below).
> -  volatile intptr_t _stack_storage_lock[1];
> -
>    // The traces currently sampled.
> @@ -313,3 +310,2 @@
>    _initialized(false) {
> -    _stack_storage_lock[0] = 0;
>  }
> @@ -532,13 +528,2 @@
>
> -// Delegate the initialization question to the underlying storage system.
> -bool HeapMonitoring::initialized() {
> -  return StackTraceStorage::storage()->initialized();
> -}
> -
> -// Delegate the initialization question to the underlying storage system.
> -bool *HeapMonitoring::initialized_address() {
> -  return
> -      const_cast<bool*>(StackTraceStorage::storage()->initialized_
> address());
> -}
> -
>  void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
> --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
> 16:54:06 2017 +0200
> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
> 17:42:42 2017 +0200
> @@ -35,3 +35,2 @@
>    static uint64_t _rnd;
> -  static bool _initialized;
>    static jint _monitoring_rate;
> @@ -92,7 +91,2 @@
>
> -  // Is the profiler initialized and where is the address to the
> initialized
> -  // boolean.
> -  static bool initialized();
> -  static bool *initialized_address();
> -
>    // Called when o is to be sampled from a given thread and a given size.
>
>
>
> On 10/10/2017 12:57 AM, JC Beyler wrote:
>
>> Dear all,
>>
>> Thread-safety is back!! Here is the update webrev:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>>
>> Full webrev is here:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>>
>> In order to really test this, I needed to add this so thought now was a
>> good time. It required a few changes here for the creation to ensure
>> correctness and safety. Now we keep the static pointer but clear the data
>> internally so on re-initialize, it will be a bit more costly than before. I
>> don't think this is a huge use-case so I did not think it was a problem. I
>> used the internal MutexLocker, I think I used it well, let me know.
>>
>> I also added three tests:
>>
>> 1) Stack depth test:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorStackDepthTest.java.patch
>>
>> This test shows that the maximum stack depth system is working.
>>
>> 2) Thread safety:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadTest.java.patch
>>
>> The test creates 24 threads and they all allocate at the same time. The
>> test then checks it does find samples from all the threads.
>>
>> 3) Thread on/off safety
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadOnOffTest.java.patch
>>
>> The test creates 24 threads that all allocate a bunch of memory. Then
>> another thread turns the sampling on/off.
>>
>> Btw, both tests 2 & 3 failed without the locks.
>>
>> As I worked on this, I saw a lot of places where the tests are doing very
>> similar things, I'm going to clean up the code a bit and make a
>> HeapAllocator class that all tests can call directly. This will greatly
>> simplify the code.
>>
>> Thanks for any comments/criticisms!
>> Jc
>>
>>
>> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>>     Dear all,
>>
>>     Small update to the webrev:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>>
>>     Full webrev is here:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>>
>>     I updated a bit of the naming, removed a TODO comment, and I added a
>> test for testing the sampling rate. I also updated the maximum stack depth
>> to 1024, there is no
>>     reason to keep it so small. I did a micro benchmark that tests the
>> overhead and it seems relatively the same.
>>
>>     I compared allocations from a stack depth of 10 and allocations from
>> a stack depth of 1024 (allocations are from the same helper method in
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi
>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/
>> MyPackage/HeapMonitorStatRateTest.java
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>> /MyPackage/HeapMonitorStatRateTest.java>):
>>                - For an array of 1 integer allocated in a loop; stack
>> depth 1024 vs stack depth 10: 1% slower
>>                - For an array of 200k integers allocated in a loop; stack
>> depth 1024 vs stack depth 10: 3% slower
>>
>>     So basically now moving the maximum stack depth to 1024 but we only
>> copy over the stack depths actually used.
>>
>>     For the next webrev, I will be adding a stack depth test to show that
>> it works and probably put back the mutex locking so that we can see how
>> difficult it is to keep
>>     thread safe.
>>
>>     Let me know what you think!
>>     Jc
>>
>>
>>
>>     On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>         Forgot to say that for my numbers:
>>           - Not in the test are the actual numbers I got for the various
>> array sizes, I ran the program 30 times and parsed the output; here are the
>> averages and standard
>>         deviation:
>>                1000:     1.28% average; 1.13% standard deviation
>>                10000:    1.59% average; 1.25% standard deviation
>>                100000:   1.26% average; 1.26% standard deviation
>>
>>         The 1000/10000/100000 are the sizes of the arrays being
>> allocated. These are allocated 100k times and the sampling rate is 111
>> times the size of the array.
>>
>>         Thanks!
>>         Jc
>>
>>
>>         On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>             Hi all,
>>
>>             After a bit of a break, I am back working on this :). As
>> before, here are two webrevs:
>>
>>             - Full change set: http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.09/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.09/>
>>             - Compared to version 8: http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.08_09/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/>
>>                  (This version is compared to version 8 I last showed but
>> ported to the new folder hierarchy)
>>
>>             In this version I have:
>>                - Handled Thomas' comments from his email of 07/03:
>>                     - Merged the logging to be standard
>>                     - Fixed up the code a bit where asked
>>                     - Added some notes about the code not being
>> thread-safe yet
>>                 - Removed additional dead code from the version that
>> modifies interpreter/c1/c2
>>                 - Fixed compiler issues so that it compiles with
>> --disable-precompiled-header
>>                      - Tested with ./configure --with-boot-jdk=<jdk8>
>> --with-debug-level=slowdebug --disable-precompiled-headers
>>
>>             Additionally, I added a test to check the sanity of the
>> sampler: HeapMonitorStatCorrectnessTest
>>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStatCorrectnessTest.java.patch <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>)
>>                 - This allocates a number of arrays and checks that we
>> obtain the number of samples we want with an accepted error of 5%. I tested
>> it 100 times and it
>>             passed everytime, I can test more if wanted
>>                 - Not in the test are the actual numbers I got for the
>> various array sizes, I ran the program 30 times and parsed the output; here
>> are the averages and
>>             standard deviation:
>>                    1000:     1.28% average; 1.13% standard deviation
>>                    10000:    1.59% average; 1.25% standard deviation
>>                    100000:   1.26% average; 1.26% standard deviation
>>
>>             What this means is that we were always at about 1~2% of the
>> number of samples the test expected.
>>
>>             Let me know what you think,
>>             Jc
>>
>>             On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <
>> [email protected] <mailto:[email protected]>> wrote:
>>
>>                 Hi all,
>>
>>                 I apologize, I have not yet handled your remarks but
>> thought this new webrev would also be useful to see and comment on perhaps.
>>
>>                 Here is the latest webrev, it is generated slightly
>> different than the others since now I'm using webrev.ksh without the -N
>> option:
>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>>
>>                 And the webrev.07 to webrev.08 diff is here:
>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
>>
>>                 (Let me know if it works well)
>>
>>                 It's a small change between versions but it:
>>                    - provides a fix that makes the average sample rate
>> correct (more on that below).
>>                    - fixes the code to actually have it play nicely with
>> the fast tlab refill
>>                    - cleaned up a bit the JVMTI text and now use
>> jvmtiFrameInfo
>>                 - moved the capability to be onload solo
>>
>>                 With this webrev, I've done a small study of the random
>> number generator we use here for the sampling rate. I took a small program
>> and it can be simplified to:
>>
>>                 for (outer loop)
>>                 for (inner loop)
>>                 int[] tmp = new int[arraySize];
>>
>>                 - I've fixed the outer and inner loops to being 800 for
>> this experiment, meaning we allocate 640000 times an array of a given array
>> size.
>>
>>                 - Each program provides the average sample size used for
>> the whole execution
>>
>>                 - Then, I ran each variation 30 times and then calculated
>> the average of the average sample size used for various array sizes. I
>> selected the array size to
>>                 be one of the following: 1, 10, 100, 1000.
>>
>>                 - When compared to 512kb, the average sample size of 30
>> runs:
>>                 1: 4.62% of error
>>                 10: 3.09% of error
>>                 100: 0.36% of error
>>                 1000: 0.1% of error
>>                 10000: 0.03% of error
>>
>>                 What it shows is that, depending on the number of
>> samples, the average does become better. This is because with an allocation
>> of 1 element per array, it
>>                 will take longer to hit one of the thresholds. This is
>> seen by looking at the sample count statistic I put in. For the same number
>> of iterations (800 *
>>                 800), the different array sizes provoke:
>>                 1: 62 samples
>>                 10: 125 samples
>>                 100: 788 samples
>>                 1000: 6166 samples
>>                 10000: 57721 samples
>>
>>                 And of course, the more samples you have, the more sample
>> rates you pick, which means that your average gets closer using that math.
>>
>>                 Thanks,
>>                 Jc
>>
>>                 On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <
>> [email protected] <mailto:[email protected]>> wrote:
>>
>>                     Thanks Robbin,
>>
>>                     This seems to have worked. When I have the next
>> webrev ready, we will find out but I'm fairly confident it will work!
>>
>>                     Thanks agian!
>>                     Jc
>>
>>                     On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <
>> [email protected] <mailto:[email protected]>> wrote:
>>
>>                         Hi JC,
>>
>>                         On 06/29/2017 12:15 AM, JC Beyler wrote:
>>
>>                             B) Incremental changes
>>
>>
>>                         I guess the most common work flow here is using
>> mq :
>>                         hg qnew fix_v1
>>                         edit files
>>                         hg qrefresh
>>                         hg qnew fix_v2
>>                         edit files
>>                         hg qrefresh
>>
>>                         if you do hg log you will see 2 commits
>>
>>                         webrev.ksh -r -2 -o my_inc_v1_v2
>>                         webrev.ksh -o my_full_v2
>>
>>
>>                         In  your .hgrc you might need:
>>                         [extensions]
>>                         mq =
>>
>>                         /Robbin
>>
>>
>>                             Again another newbiew question here...
>>
>>                             For showing the incremental changes, is there
>> a link that explains how to do that? I apologize for my newbie questions
>> all the time :)
>>
>>                             Right now, I do:
>>
>>                                ksh ../webrev.ksh -m -N
>>
>>                             That generates a webrev.zip and send it to
>> Chuck Rasbold. He then uploads it to a new webrev.
>>
>>                             I tried commiting my change and adding a
>> small change. Then if I just do ksh ../webrev.ksh without any options, it
>> seems to produce a similar
>>                             page but now with only the changes I had (so
>> the 06-07 comparison you were talking about) and a changeset that has it
>> all. I imagine that is
>>                             what you meant.
>>
>>                             Which means that my workflow would become:
>>
>>                             1) Make changes
>>                             2) Make a webrev without any options to show
>> just the differences with the tip
>>                             3) Amend my changes to my local commit so
>> that I have it done with
>>                             4) Go to 1
>>
>>                             Does that seem correct to you?
>>
>>                             Note that when I do this, I only see the full
>> change of a file in the full change set (Side note here: now the page says
>> change set and not
>>                             patch, which is maybe why Serguei was having
>> issues?).
>>
>>                             Thanks!
>>                             Jc
>>
>>
>>
>>                             On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <
>> [email protected] <mailto:[email protected]> <mailto:
>> [email protected]
>>                             <mailto:[email protected]>>> wrote:
>>
>>                                  Hi,
>>
>>                                  On 06/28/2017 12:04 AM, JC Beyler wrote:
>>
>>                                      Dear Thomas et al,
>>
>>                                      Here is the newest webrev:
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/>>
>>
>>
>>
>>                                  You have some more bits to in there but
>> generally this looks good and really nice with more tests.
>>                                  I'll do and deep dive and re-test this
>> when I get back from my long vacation with whatever patch version you have
>> then.
>>
>>                                  Also I think it's time you provide
>> incremental (v06->07 changes) as well as complete change-sets.
>>
>>                                  Thanks, Robbin
>>
>>
>>
>>
>>                                      Thomas, I "think" I have answered
>> all your remarks. The summary is:
>>
>>                                      - The statistic system is up and
>> provides insight on what the heap sampler is doing
>>                                           - I've noticed that, though the
>> sampling rate is at the right mean, we are missing some samples, I have not
>> yet tracked out why
>>                             (details below)
>>
>>                                      - I've run a tiny benchmark that is
>> the worse case: it is a very tight loop and allocated a small array
>>                                           - In this case, I see no
>> overhead when the system is off so that is a good start :)
>>                                           - I see right now a high
>> overhead in this case when sampling is on. This is not a really too
>> surprising but I'm going to see if
>>                             this is consistent with our
>>                                      internal implementation. The
>> benchmark is really allocation stressful so I'm not too surprised but I
>> want to do the due diligence.
>>
>>                                         - The statistic system up is up
>> and I have a new test
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch>>
>>                                            - I did a bit of a study about
>> the random generator here, more details are below but basically it seems to
>> work well
>>
>>                                         - I added a capability but since
>> this is the first time doing this, I was not sure I did it right
>>                                           - I did add a test though for
>> it and the test seems to do what I expect (all methods are failing with the
>>                             JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>                                               -
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>>                                                                 <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
>>
>>                                         - I still need to figure out what
>> to do about the multi-agent vs single-agent issue
>>
>>                                         - As far as measurements, it
>> seems I still need to look at:
>>                                           - Why we do the 20 random calls
>> first, are they necessary?
>>                                           - Look at the mean of the
>> sampling rate that the random generator does and also what is actually
>> sampled
>>                                           - What is the overhead in terms
>> of memory/performance when on?
>>
>>                                      I have inlined my answers, I think I
>> got them all in the new webrev, let me know your thoughts.
>>
>>                                      Thanks again!
>>                                      Jc
>>
>>
>>                                      On Fri, Jun 23, 2017 at 3:52 AM,
>> Thomas Schatzl <[email protected] <mailto:thomas.schatzl@oracle.
>> com>
>>                             <mailto:[email protected] <mailto:
>> [email protected]>> <mailto:[email protected] <mailto:
>> [email protected]>
>>
>>                                      <mailto:[email protected]
>> <mailto:[email protected]>>>> wrote:
>>
>>                                           Hi,
>>
>>                                           On Wed, 2017-06-21 at 13:45
>> -0700, JC Beyler wrote:
>>                                           > Hi all,
>>                                           >
>>                                           > First off: Thanks again to
>> Robbin and Thomas for their reviews :)
>>                                           >
>>                                           > Next, I've uploaded a new
>> webrev:
>>                                           >
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>>>
>>
>>                                           >
>>                                           > Here is an update:
>>                                           >
>>                                           > - @Robbin, I forgot to say
>> that yes I need to look at implementing
>>                                           > this for the other
>> architectures and testing it before it is all
>>                                           > ready to go. Is it common to
>> have it working on all possible
>>                                           > combinations or is there a
>> subset that I should be doing first and we
>>                                           > can do the others later?
>>                                           > - I've tested slowdebug,
>> built and ran the JTreg tests I wrote with
>>                                           > slowdebug and fixed a few
>> more issues
>>                                           > - I've refactored a bit of
>> the code following Thomas' comments
>>                                           >    - I think I've handled all
>> the comments from Thomas (I put
>>                                           > comments inline below for the
>> specifics)
>>
>>                                           Thanks for handling all those.
>>
>>                                           > - Following Thomas' comments
>> on statistics, I want to add some
>>                                           > quality assurance tests and
>> find that the easiest way would be to
>>                                           > have a few counters of what
>> is happening in the sampler and expose
>>                                           > that to the user.
>>                                           >    - I'll be adding that in
>> the next version if no one sees any
>>                                           > objections to that.
>>                                           >    - This will allow me to
>> add a sanity test in JTreg about number of
>>                                           > samples and average of
>> sampling rate
>>                                           >
>>                                           > @Thomas: I had a few
>> questions that I inlined below but I will
>>                                           > summarize the "bigger ones"
>> here:
>>                                           >    - You mentioned constants
>> are not using the right conventions, I
>>                                           > looked around and didn't see
>> any convention except normal naming then
>>                                           > for static constants. Is that
>> right?
>>
>>                                           I looked through
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>>                             <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>>
>>                                      <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>
>>                             <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>>>
>>                                           de and the rule is to "follow
>> an existing pattern and must have a
>>                                           distinct appearance from other
>> names". Which does not help a lot I
>>                                           guess :/ The GC team started
>> using upper camel case, e.g.
>>                                           SomeOtherConstant, but very
>> likely this is probably not applied
>>                                           consistently throughout. So I
>> am fine with not adding another style
>>                                           (like kMaxStackDepth with the
>> "k" in front with some unknown meaning)
>>                                           is fine.
>>
>>                                           (Chances are you will find that
>> style somewhere used anyway too,
>>                                           apologies if so :/)
>>
>>
>>                                      Thanks for that link, now I know
>> where to look. I used the upper camel case in my code as well then :) I
>> should have gotten them all.
>>
>>
>>                                            > PS: I've also inlined my
>> answers to Thomas below:
>>                                            >
>>                                            > On Tue, Jun 13, 2017 at 8:03
>> AM, Thomas Schatzl <thomas.schatzl@oracl
>>                                            > e.com <http://e.com> <
>> http://e.com> <http://e.com>> wrote:
>>                                            > > Hi all,
>>                                            > >
>>                                            > > On Mon, 2017-06-12 at
>> 11:11 -0700, JC Beyler wrote:
>>                                            > > > Dear all,
>>                                            > > >
>>                                            > > > I've continued working
>> on this and have done the following
>>                                            > > webrev:
>>                                            > > >
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>>>
>>
>>                                            > >
>>                                            > > [...]
>>                                            > > > Things I still need to
>> do:
>>                                            > > >    - Have to fix that
>> TLAB case for the FastTLABRefill
>>                                            > > >    - Have to start
>> looking at the data to see that it is
>>                                            > > consistent and does gather
>> the right samples, right frequency, etc.
>>                                            > > >    - Have to check the
>> GC elements and what that produces
>>                                            > > >    - Run a slowdebug run
>> and ensure I fixed all those issues you
>>                                            > > saw > Robbin
>>                                            > > >
>>                                            > > > Thanks for looking at
>> the webrev and have a great week!
>>                                            > >
>>                                            > >   scratching a bit on the
>> surface of this change, so apologies for
>>                                            > > rather shallow comments:
>>                                            > >
>>                                            > > -
>> macroAssembler_x86.cpp:5604: while this is compiler code, and I
>>                                            > > am not sure this is final,
>> please avoid littering the code with
>>                                            > > TODO remarks :) They tend
>> to be candidates for later wtf moments
>>                                            > > only.
>>                                            > >
>>                                            > > Just file a CR for that.
>>                                            > >
>>                                            > Newcomer question: what is a
>> CR and not sure I have the rights to do
>>                                            > that yet ? :)
>>
>>                                           Apologies. CR is a change
>> request, this suggests to file a bug in the
>>                                           bug tracker. And you are right,
>> you can't just create a new account in
>>                                           the OpenJDK JIRA yourselves. :(
>>
>>
>>                                      Ok good to know, I'll continue with
>> my own todo list but I'll work hard on not letting it slip in the webrevs
>> anymore :)
>>
>>
>>                                           I was mostly referring to the
>> "... but it is a TODO" part of that
>>                                           comment in
>> macroassembler_x86.cpp. Comments about the why of the code
>>                                           are appreciated.
>>
>>                                           [Note that I now understand
>> that this is to some degree still work in
>>                                           progress. As long as the final
>> changeset does no contain TODO's I am
>>                                           fine (and it's not a hard
>> objection, rather their use in "final" code
>>                                           is typically limited in my
>> experience)]
>>
>>                                           5603   // Currently, if this
>> happens, just set back the actual end to
>>                                           where it was.
>>                                           5604   // We miss a chance to
>> sample here.
>>
>>                                           Would be okay, if explaining
>> "this" and the "why" of missing a chance
>>                                           to sample here would be best.
>>
>>                                           Like maybe:
>>
>>                                           // If we needed to refill
>> TLABs, just set the actual end point to
>>                                           // the end of the TLAB again.
>> We do not sample here although we could.
>>
>>                                      Done with your comment, it works
>> well in my mind.
>>
>>                                           I am not sure whether "miss a
>> chance to sample" meant "we could, but
>>                                           consciously don't because it's
>> not that useful" or "it would be
>>                                           necessary but don't because
>> it's too complicated to do.".
>>
>>                                           Looking at the original comment
>> once more, I am also not sure if that
>>                                           comment shouldn't referring to
>> the "end" variable (not actual_end)
>>                                           because that's the variable
>> that is responsible for taking the sampling
>>                                           path? (Going from the member
>> description of ThreadLocalAllocBuffer).
>>
>>
>>                                      I've moved this code and it no
>> longer shows up here but the rationale and answer was:
>>
>>                                      So.. Yes, end is the variable
>> provoking the sampling. Actual end is the actual end of the TLAB.
>>
>>                                      What was happening here is that the
>> code is resetting _end to point towards the end of the new TLAB. Because,
>> we now have the end for
>>                             sampling and _actual_end for
>>                                      the actual end, we need to update
>> the actual_end as well.
>>
>>                                      Normally, were we to do the real
>> work here, we would calculate the (end - start) offset, then do:
>>
>>                                      - Set the new end to : start +
>> (old_end - old_start)
>>                                      - Set the actual end like we do here
>> now where it because it is the actual end.
>>
>>                                      Why is this not done here now
>> anymore?
>>                                          - I was still debating which
>> path to take:
>>                                             - Do it in the fast refill
>> code, it has its perks:
>>                                                 - In a world where fast
>> refills are happening all the time or a lot, we can augment there the code
>> to do the sampling
>>                                             - Remember what we had as an
>> end before leaving the slowpath and check on return
>>                                                 - This is what I'm doing
>> now, it removes the need to go fix up all fast refill paths but if you
>> remain in fast refill paths,
>>                             you won't get sampling. I
>>                                      have to think of the consequences of
>> that, maybe a future change later on?
>>                                                    - I have the
>> statistics now so I'm going to study that
>>                                                       -> By the way,
>> though my statistics are showing I'm missing some samples, if I turn off
>> FastTlabRefill, it is the same
>>                             loss so for now, it seems
>>                                      this does not occur in my simple
>> test.
>>
>>
>>
>>                                           But maybe I am only confused
>> and it's best to just leave the comment
>>                                           away. :)
>>
>>                                           Thinking about it some more,
>> doesn't this not-sampling in this case
>>                                           mean that sampling does not
>> work in any collector that does inline TLAB
>>                                           allocation at the moment? (Or
>> is inline TLAB alloc automatically
>>                                           disabled with sampling somehow?)
>>
>>                                           That would indeed be a bigger
>> TODO then :)
>>
>>
>>                                      Agreed, this remark made me think
>> that perhaps as a first step the new way of doing it is better but I did
>> have to:
>>                                         - Remove the const of the
>> ThreadLocalBuffer remaining and hard_end methods
>>                                         - Move hard_end out of the header
>> file to have a bit more logic there
>>
>>                                      Please let me know what you think of
>> that and if you prefer it this way or changing the fast refills. (I prefer
>> this way now because it
>>                             is more incremental).
>>
>>
>>                                           > > - calling
>> HeapMonitoring::do_weak_oops() (which should probably be
>>                                           > > called weak_oops_do() like
>> other similar methods) only if string
>>                                           > > deduplication is enabled
>> (in g1CollectedHeap.cpp:4511) seems wrong.
>>                                           >
>>                                           > The call should be at least
>> around 6 lines up outside the if.
>>                                           >
>>                                           > Preferentially in a method
>> like process_weak_jni_handles(), including
>>                                           > additional logging. (No new
>> (G1) gc phase without minimal logging
>>                                           > :)).
>>                                           > Done but really not sure
>> because:
>>                                           >
>>                                           > I put for logging:
>>                                           >   log_develop_trace(gc,
>> freelist)("G1ConcRegionFreeing [other] : heap
>>                                           > monitoring");
>>
>>                                           I would think that "gc, ref"
>> would be more appropriate log tags for
>>                                           this similar to jni handles.
>>                                           (I am als not sure what weak
>> reference handling has to do with
>>                                           G1ConcRegionFreeing, so I am a
>> bit puzzled)
>>
>>
>>                                      I was not sure what to put for the
>> tags or really as the message. I cleaned it up a bit now to:
>>                                           log_develop_trace(gc,
>> ref)("HeapSampling [other] : heap monitoring processing");
>>
>>
>>
>>                                           > Since weak_jni_handles didn't
>> have logging for me to be inspired
>>                                           > from, I did that but
>> unconvinced this is what should be done.
>>
>>                                           The JNI handle processing does
>> have logging, but only in
>>                                           
>> ReferenceProcessor::process_discovered_references().
>> In
>>                                           process_weak_jni_handles() only
>> overall time is measured (in a G1
>>                                           specific way, since only G1
>> supports disabling reference procesing) :/
>>
>>                                           The code in ReferenceProcessor
>> prints both time taken
>>                                           referenceProcessor.cpp:254, as
>> well as the count, but strangely only in
>>                                           debug VMs.
>>
>>                                           I have no idea why this logging
>> is that unimportant to only print that
>>                                           in a debug VM. However there
>> are reviews out for changing this area a
>>                                           bit, so it might be useful to
>> wait for that (JDK-8173335).
>>
>>
>>                                      I cleaned it up a bit anyway and now
>> it returns the count of objects that are in the system.
>>
>>
>>                                           > > - the change doubles the
>> size of
>>                                           > >
>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>>                                           > > threshold. Maybe it could
>> be refactored a bit.
>>                                           > Done I think, it looks better
>> to me :).
>>
>>                                           In
>> ThreadLocalAllocBuffer::handle_sample() I think the
>>                                           
>> set_back_actual_end()/pick_next_sample()
>> calls could be hoisted out of
>>                                           the "if" :)
>>
>>
>>                                      Done!
>>
>>
>>                                           > > -
>> referenceProcessor.cpp:261: the change should add logging about
>>                                           > > the number of references
>> encountered, maybe after the corresponding
>>                                           > > "JNI weak reference count"
>> log message.
>>                                           > Just to double check, are you
>> saying that you'd like to have the heap
>>                                           > sampler to keep in store how
>> many sampled objects were encountered in
>>                                           > the
>> HeapMonitoring::weak_oops_do?
>>                                           >    - Would a return of the
>> method with the number of handled
>>                                           > references and logging that
>> work?
>>
>>                                           Yes, it's fine if
>> HeapMonitoring::weak_oops_do() only returned the
>>                                           number of processed weak oops.
>>
>>
>>                                      Done also (but I admit I have not
>> tested the output yet) :)
>>
>>
>>                                           >    - Additionally, would you
>> prefer it in a separate block with its
>>                                           > GCTraceTime?
>>
>>                                           Yes. Both kinds of information
>> is interesting: while the time taken is
>>                                           typically more important, the
>> next question would be why, and the
>>                                           number of references typically
>> goes a long way there.
>>
>>                                           See above though, it is
>> probably best to wait a bit.
>>
>>
>>                                      Agreed that I "could" wait but, if
>> it's ok, I'll just refactor/remove this when we get closer to something
>> final. Either, JDK-8173335
>>                                      has gone in and I will notice it now
>> or it will soon and I can change it then.
>>
>>
>>                                           > > -
>> threadLocalAllocBuffer.cpp:331: one more "TODO"
>>                                           > Removed it and added it to my
>> personal todos to look at.
>>                                           >      > >
>>                                           > > -
>> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>>                                           > > documentation should be
>> updated about the sampling additions. I
>>                                           > > would have no clue what the
>> difference between "actual_end" and
>>                                           > > "end" would be from the
>> given information.
>>                                           > If you are talking about the
>> comments in this file, I made them more
>>                                           > clear I hope in the new
>> webrev. If it was somewhere else, let me know
>>                                           > where to change.
>
>

Re: Low-Overhead Heap Profiling

Reply via email to