Re: Low-Overhead Heap Profiling

Robbin Ehn Thu, 04 May 2017 02:15:03 -0700

Hi,

To me the compiler changes looks what is expected.
It would be good if someone from compiler could take a look at that.
Added compiler to mail thread.


Also adding Serguei, It would be good with his view also.

My initial take on it, read through most of the code and took it for a ride.

##############################
- Regarding the compiler changes: I think we need the 'TLAB end' trickery 
(mentioned by Tony P)
instead of a separate check for sampling in fast path for the final version.

##############################
- This patch I had to apply to get it compile on JDK 10:

diff -r ac3ded340b35 src/share/vm/gc/shared/collectedHeap.inline.hpp
--- a/src/share/vm/gc/shared/collectedHeap.inline.hpp   Fri Apr 28 14:31:38 
2017 +0200
+++ b/src/share/vm/gc/shared/collectedHeap.inline.hpp   Thu May 04 10:22:56 
2017 +0200
@@ -87,3 +87,3 @@
     // support for object alloc event (no-op most of the time)
-    if (klass() != NULL && klass()->name() != NULL) {
+    if (klass != NULL && klass->name() != NULL) {
       Thread *base_thread = Thread::current();
diff -r ac3ded340b35 src/share/vm/runtime/heapMonitoring.cpp
--- a/src/share/vm/runtime/heapMonitoring.cpp   Fri Apr 28 14:31:38 2017 +0200
+++ b/src/share/vm/runtime/heapMonitoring.cpp   Thu May 04 10:22:56 2017 +0200
@@ -316,3 +316,3 @@
   JavaThread *thread = reinterpret_cast<JavaThread *>(Thread::current());
-  assert(o->size() << LogHeapWordSize == byte_size,
+  assert(o->size() << LogHeapWordSize == (long)byte_size,
          "Object size is incorrect.");

##############################
- This patch I had to apply to get it not asserting during slowdebug:

--- a/src/share/vm/runtime/heapMonitoring.cpp   Fri Apr 28 15:15:16 2017 +0200
+++ b/src/share/vm/runtime/heapMonitoring.cpp   Thu May 04 10:24:25 2017 +0200
@@ -32,3 +32,3 @@
 // TODO(jcbeyler): should we make this into a JVMTI structure?
-struct StackTraceData {
+struct StackTraceData : CHeapObj<mtInternal> {
   ASGCT_CallTrace *trace;
@@ -143,3 +143,2 @@
 StackTraceStorage::StackTraceStorage() :
-    _allocated_traces(new StackTraceData*[MaxHeapTraces]),
     _allocated_traces_size(MaxHeapTraces),
@@ -147,2 +146,3 @@
     _allocated_count(0) {
+  _allocated_traces = NEW_C_HEAP_ARRAY(StackTraceData*, MaxHeapTraces, 
mtInternal);
   memset(_allocated_traces, 0, sizeof(*_allocated_traces) * MaxHeapTraces);
@@ -152,3 +152,3 @@
 StackTraceStorage::~StackTraceStorage() {
-  delete[] _allocated_traces;
+  FREE_C_HEAP_ARRAY(StackTraceData*, _allocated_traces);
 }

- Classes should extend correct base class for which type of memory is used for it 
e.g.: CHeapObj<mt????> or StackObj or AllStatic
- The style in heapMonitoring.cpp is a bit different from normal vm-style, e.g. 
using C++ casts instead of C. You mix NEW_C_HEAP_ARRAY, os::malloc and new.
- In jvmtiHeapTransition.hpp you use C cast instead.

##############################
- This patch I had apply to get traces without setting an ‘unrelated’ capability
- Should this not be a new capability?

diff -r c02a5d8785bf src/share/vm/prims/forte.cpp
--- a/src/share/vm/prims/forte.cpp      Fri Apr 28 15:15:16 2017 +0200
+++ b/src/share/vm/prims/forte.cpp      Thu May 04 10:24:25 2017 +0200
@@ -530,6 +530,6 @@

-  if (!JvmtiExport::should_post_class_load()) {
+/*  if (!JvmtiExport::should_post_class_load()) {
     trace->num_frames = ticks_no_class_load; // -1
     return;
-  }
+  }*/

##############################
- forte.cpp: (I know this is not part of your changes but)
find_jmethod_id_or_null give me NULL for my test.
It looks like we actually want the regular jmethod_id() ?

Since we are the thread we are talking about (and in same ucontext) and thread 
is in vm and have a last java frame,
I think most of the checks done in AsyncGetCallTrace is irrelevant, so you 
should be-able to call forte_fill_call_trace_given_top directly.
But since we might need jmethod_id() if possible to avoid getting method id 
NULL,
we need some fixes in forte code, or just do the vframStream loop inside 
heapMonitoring.cpp and not use forte.cpp.

Something like:

  if (jthread->has_last_Java_frame()) { // just to be safe
    vframeStream vfst(jthread);
    while (!vfst.at_end()) {
      Method* m = vfst.method();
      m->jmethod_id();
      m->line_number_from_bci(vfst.bci());
      vfst.next();
    }

- This is a bit confusing in forte.cpp, trace->frames[count].lineno = bci.
Line number should be m->line_number_from_bci(bci);
Do the heapMonitoring suppose to trace with bci or line number?
I would say bci, meaning we should either rename ASGCT_CallFrame→lineno or use 
another data structure which says bci.

##############################
- // TODO(jcbeyler): remove this extra code handling the extra trace for
Please fix all these TODO's :)

##############################
- heapMonitoring.hpp:
// TODO(jcbeyler): is this algorithm acceptable in open source?

Why is this comment here? What is the implication?
Have you tested any simpler algorithm?

##############################
- Create a sanity jtreg test. (./hotspot/make/test/JtregNative.gmk for building 
the agent)

##############################
- monitoring_period vs HeapMonitorRate, pick rate or period.

##############################
- globals.hpp
Why is MaxHeapTraces not settable/overridable from jvmti interface? That would 
be handy.

##############################
- jvmtiStackTraceData + ASGCT_CallFrame memory
Are the agent suppose to loop through and free all ASGCT_CallFrame?
Wouldn't it be better with some kinda protocol, like:
(*jvmti)->GetLiveTraces(jvmti, &stack_traces, &num_traces);
(*jvmti)->ReleaseTraces(jvmti, stack_traces, num_traces);

Also using another data structure that have num_traces inside it simplifies 
things.
So I'm not convinced using the async structure is the best way forward.


I have more questions, but I think it's better if you respond and update the 
code first.

Thanks!

/Robbin


On 04/21/2017 11:34 PM, JC Beyler wrote:

Hi all,

I've added size information to the allocation sampling system. This allows the 
callback to remember the size of each sampled allocation.
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/

The new webrev.01 also adds the actual heap monitoring sampling system in files:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.cpp.patch
and
http://cr.openjdk.java.net/~rasbold/8171119/webrev.01/src/share/vm/runtime/heapMonitoring.hpp.patch

My next step is to add the GC part to the webrev, which will allow users to 
determine what objects are live and what are garbage.

Thanks for your attention and let me know if there are any questions!

Have a wonderful Friday!
Jc

On Mon, Apr 17, 2017 at 12:37 PM, JC Beyler <jcbey...@google.com 
<mailto:jcbey...@google.com>> wrote:

    Hi all,

    I worked on getting a few numbers for overhead and accuracy for my feature. 
I'm unsure if here is the right place to provide the full data, so I am just 
summarizing
    here for now.

    - Overhead of the feature

    Using the Dacapo benchmark (http://dacapobench.org/). My initial results 
are that sampling provides 2.4% with a 512k sampling, 512k being our default 
setting.

    - Note: this was without the tradesoap, tradebeans and tomcat benchmarks 
since they did not work with my JDK9 (issue between Dacapo and JDK9 it seems)
    - I want to rerun next week to ensure number stability

    - Accuracy of the feature

    I wrote a small microbenchmark that allocates from two different 
stacktraces at a given ratio. For example, 10% of stacktrace S1 and 90% from 
stacktrace S2. The
    microbenchmark was run 20 times, I averaged the results and looked for 
accuracy. It seems that statistically it is sound since if I allocated10% S1 
and 90% S2, with a
    sampling rate of 512k, I obtained 9.61% S1 and 90.49% S2.

    Let me know if there are any questions on the numbers and if you'd like to 
see some more data.

    Note: this was done using our internal JDK8 implementation since the webrev 
provided by http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html
    <http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html> does not 
yet contain the whole implementation and therefore would have been misleading.

    Thanks,
    Jc


    On Tue, Apr 4, 2017 at 3:55 PM, JC Beyler <jcbey...@google.com 
<mailto:jcbey...@google.com>> wrote:

        Hi all,

        To move the discussion forward, with Chuck Rasbold's help to make a 
webrev, we pushed this:
        http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html 
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/index.html>
        415 lines changed: 399 ins; 13 del; 3 mod; 51122 unchg

        This is not a final change that does the whole proposition from the JBS 
entry: https://bugs.openjdk.java.net/browse/JDK-8177374
        <https://bugs.openjdk.java.net/browse/JDK-8177374>; what it does show 
is parts of the implementation that is proposed and hopefully can start the 
conversation going
        as I work through the details.

        For example, the changes to C2 are done here for the allocations: 
http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch
        
<http://cr.openjdk.java.net/~rasbold/heapz/webrev.00/src/share/vm/opto/macro.cpp.patch>

        Hopefully this all makes sense and thank you for all your future 
comments!
        Jc


        On Tue, Dec 13, 2016 at 1:11 PM, JC Beyler <jcbey...@google.com 
<mailto:jcbey...@google.com>> wrote:

            Hello all,

            This is a follow-up from Jeremy's initial email from last year:
            
http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html 
<http://mail.openjdk.java.net/pipermail/serviceability-dev/2015-June/017543.html>

            I've gone ahead and started working on preparing this and Jeremy 
and I went down the route of actually writing it up in JEP form:
            https://bugs.openjdk.java.net/browse/JDK-8171119

            I think original conversation that happened last year in that 
thread still holds true:

              - We have a patch at Google that we think others might be 
interested in
                 - It provides a means to understand where the allocation 
hotspots are at a very low overhead
                 - Since it is at a low overhead, we can leave it on by default

            So I come to the mailing list with Jeremy's initial question:
            "I thought I would ask if there is any interest / if I should write a 
JEP / if I should just forget it."

            A year ago, it seemed some thought it was a good idea, is this 
still true?

            Thanks,
            Jc

Re: Low-Overhead Heap Profiling

Reply via email to