Re: JDK-8171119: Low-Overhead Heap Profiling

2018-04-26 Thread JC Beyler
fully initialized
>>>>>- the object needs to have the header set up - not just the memory
>>>>> allocated - so that applies to all objects
>>>>>(and that is done in a caller of the allocation code - so it can’t
>>>>> be done at the location which does the memory allocation)
>>>>>- the example I pointed out was the multianewarray case - all the
>>>>> subarrays need to be allocated
>>>>>
>>>>> - please add a test case for multi-array, so as you experiment with
>>>>> where to post the event, you ensure that you can
>>>>> access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as
>>>>> subarrays)
>>>>>
>>>>
>>>> Technically it's more than just initialized, it is the fact that you
>>>> cannot perform a callback about an object if any object of that thread is
>>>> being held by a collector and also in a register/stack space without
>>>> protections.
>>>>
>>>>
>>>>
>>>>>   - prior to setting up the object header information, GC would not
>>>>> know about the object
>>>>> - was this  by chance the issue you ran into?
>>>>>
>>>>
>>>> No I believe the issue I was running into was above where an object on
>>>> the stack was pointing to an oop that got moved during a GC due to that
>>>> thread doing an event callback.
>>>>
>>>> Thanks for your help,
>>>> Jc
>>>>
>>>>
>>>>> 3) event posting
>>>>>- when you post the event to JVMTI
>>>>>- in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj),
>>>>> which creates JNIHandles::make_local(_thread, obj)
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> The second reason is that I strongly suspect the scope you want is
>>>>>>> bytecodes only. I think once you have added hooks
>>>>>>> to all the fast paths and slow paths that this will be pushing the
>>>>>>> performance overhead constraints you proposed and
>>>>>>> you won’t want to see e.g. internal allocations.
>>>>>>>
>>>>>>
>>>>>> Yes agreed, allocations from bytecodes are mostly our concern
>>>>>> generally :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>> But I think you need to experiment with the set of allocations (or
>>>>>>> possible alternative sets of allocations) you want recorded.
>>>>>>>
>>>>>>> The hooks I see today include:
>>>>>>> Interpreter: (looking at x86 as a sample)
>>>>>>>   - slowpath in InterpreterRuntime
>>>>>>>   - fastpath tlab allocation - your new threshold check handles that
>>>>>>>
>>>>>>
>>>>>> Agreed
>>>>>>
>>>>>>
>>>>>>>   - allow_shared_alloc (GC specific): for _new isn’t handled
>>>>>>>
>>>>>>
>>>>>> Where is that exactly? I can check why we are not catching it?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> C1
>>>>>>>   I don’t see changes in c1_Runtime.cpp
>>>>>>>   note: you also want to look for the fast path
>>>>>>>
>>>>>>
>>>>>> I added the calls to c1_Runtime in the latest webrev, but was still
>>>>>> going through testing before pushing it out. I had waited on this one a
>>>>>> bit. Fast path would be handled by the threshold check no?
>>>>>>
>>>>>>
>>>>>>> C2: changes in opto/runtime.cpp for slow path
>>>>>>>did you also catch the fast path?
>>>>>>>
>>>>>>
>>>>>> Fast path gets handled by the same threshold check, no? Perhaps I've
>>>>>> missed something (very likely)?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 3. Performance -
>>>>>>> After you get all the collectors added - you need to rerun the
>>>>>>> performance numbers.
>>>>>>>
>>>>>>
>>>>>> Agreed :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>> Karen
>>>>>>>
>>>>>>> On Apr 5, 2018, at 2:15 PM, JC Beyler  wrote:
>>>>>>>
>>>>>>> Thanks Boris and Derek for testing it.
>>>>>>>
>>>>>>> Yes I was trying to get a new version out that had the tests ported
>>>>>>> as well but got sidetracked while trying to add tests and two new 
>>>>>>> features.
>>>>>>>
>>>>>>> Here is the incremental webrev:
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/
>>>>>>>
>>>>>>> Basically, the new tests assert this:
>>>>>>>   - Only one agent can currently ask for the sampling, I'm currently
>>>>>>> seeing if I can push to a next webrev the multi-agent support to start
>>>>>>> doing a code freeze on this one
>>>>>>>   - The event is not thread-enabled, meaning like the
>>>>>>> VMObjectAllocationEvent, it's an all or nothing event; same as the
>>>>>>> multi-agent, I'm going to see if a future webrev to add the support is a
>>>>>>> better idea to freeze this webrev a bit
>>>>>>>
>>>>>>> There was another item that I added here and I'm unsure this webrev
>>>>>>> is stable in debug mode: I added an assertion system to ascertain that 
>>>>>>> all
>>>>>>> paths leading to a TLAB slow path (and hence a sampling point) have a
>>>>>>> sampling collector ready to post the event if a user wants it. This 
>>>>>>> might
>>>>>>> break a few thing in debug mode as I'm working through the kinks of 
>>>>>>> that as
>>>>>>> well. However, in release mode, this new webrev passes all the tests in
>>>>>>> hotspot/jtreg/serviceability/jvmti/HeapMonitor.
>>>>>>>
>>>>>>> Let me know what you think,
>>>>>>> Jc
>>>>>>>
>>>>>>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich <
>>>>>>> boris.ulasev...@bell-sw.com> wrote:
>>>>>>>
>>>>>>>> Hi JC,
>>>>>>>>
>>>>>>>>I have just checked on arm32: your patch compiles and runs ok.
>>>>>>>>
>>>>>>>>As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does
>>>>>>>> not
>>>>>>>> correspond to actual library name: libHeapMonitorTest.c ->
>>>>>>>> libHeapMonitorTest.so
>>>>>>>>
>>>>>>>> Boris
>>>>>>>>
>>>>>>>> On 04.04.2018 01:54, White, Derek wrote:
>>>>>>>> > Thanks JC,
>>>>>>>> >
>>>>>>>> > New patch applies cleanly. Compiles and runs (simple test
>>>>>>>> programs) on
>>>>>>>> > aarch64.
>>>>>>>> >
>>>>>>>> >   * Derek
>>>>>>>> >
>>>>>>>> > *From:* JC Beyler [mailto:jcbey...@google.com]
>>>>>>>> > *Sent:* Monday, April 02, 2018 1:17 PM
>>>>>>>> > *To:* White, Derek 
>>>>>>>> > *Cc:* Erik Österlund ;
>>>>>>>> > serviceability-dev@openjdk.java.net; hotspot-compiler-dev
>>>>>>>> > 
>>>>>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling
>>>>>>>> >
>>>>>>>> > Hi Derek,
>>>>>>>> >
>>>>>>>> > I know there were a few things that went in that provoked a merge
>>>>>>>> > conflict. I worked on it and got it up to date. Sadly my lack of
>>>>>>>> > knowledge makes it a full rebase instead of keeping all the
>>>>>>>> history.
>>>>>>>> > However, with a newly cloned jdk/hs you should now be able to use:
>>>>>>>> >
>>>>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/
>>>>>>>> >
>>>>>>>> > The change you are referring to was done with the others so
>>>>>>>> perhaps you
>>>>>>>> > were unlucky and I forgot it in a webrev and fixed it in another?
>>>>>>>> I
>>>>>>>> > don't know but it's been there and I checked, it is here:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html
>>>>>>>> >
>>>>>>>> > I double checked that tlab_end_offset no longer appears in any
>>>>>>>> > architecture (as far as I can tell :)).
>>>>>>>> >
>>>>>>>> > Thanks for testing and let me know if you run into any other
>>>>>>>> issues!
>>>>>>>> >
>>>>>>>> > Jc
>>>>>>>> >
>>>>>>>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek <
>>>>>>>> derek.wh...@cavium.com
>>>>>>>> > <mailto:derek.wh...@cavium.com>> wrote:
>>>>>>>> >
>>>>>>>> > Hi Jc,
>>>>>>>> >
>>>>>>>> > I’ve been having trouble getting your patch to apply
>>>>>>>> correctly. I
>>>>>>>> > may have based it on the wrong version.
>>>>>>>> >
>>>>>>>> > In any case, I think there’s a missing update to
>>>>>>>> > macroAssembler_aarch64.cpp, in
>>>>>>>> MacroAssembler::tlab_allocate(),
>>>>>>>> > where “JavaThread::tlab_end_offset()” should become
>>>>>>>> > “JavaThread::tlab_current_end_offset()”.
>>>>>>>> >
>>>>>>>> > This should correspond to the other port’s changes in
>>>>>>>> > templateTable_.cpp files.
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> > - Derek
>>>>>>>> >
>>>>>>>> > *From:* hotspot-compiler-dev
>>>>>>>> > [mailto:hotspot-compiler-dev-boun...@openjdk.java.net
>>>>>>>> > <mailto:hotspot-compiler-dev-boun...@openjdk.java.net>] *On
>>>>>>>> Behalf
>>>>>>>> > Of *JC Beyler
>>>>>>>> > *Sent:* Wednesday, March 28, 2018 11:43 AM
>>>>>>>> > *To:* Erik Österlund >>>>>>> > <mailto:erik.osterl...@oracle.com>>
>>>>>>>> > *Cc:* serviceability-dev@openjdk.java.net
>>>>>>>> > <mailto:serviceability-dev@openjdk.java.net>;
>>>>>>>> hotspot-compiler-dev
>>>>>>>> > >>>>>>> > 
>>>>>>>
>>>>>>>


Re: JDK-8171119: Low-Overhead Heap Profiling

2018-04-05 Thread Boris Ulasevich

Hi JC,

  I have just checked on arm32: your patch compiles and runs ok.

  As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not 
correspond to actual library name: libHeapMonitorTest.c -> 
libHeapMonitorTest.so


Boris

On 04.04.2018 01:54, White, Derek wrote:

Thanks JC,

New patch applies cleanly. Compiles and runs (simple test programs) on 
aarch64.


  * Derek

*From:* JC Beyler [mailto:jcbey...@google.com]
*Sent:* Monday, April 02, 2018 1:17 PM
*To:* White, Derek 
*Cc:* Erik Österlund ; 
serviceability-dev@openjdk.java.net; hotspot-compiler-dev 


*Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling

Hi Derek,

I know there were a few things that went in that provoked a merge 
conflict. I worked on it and got it up to date. Sadly my lack of 
knowledge makes it a full rebase instead of keeping all the history. 
However, with a newly cloned jdk/hs you should now be able to use:


http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/

The change you are referring to was done with the others so perhaps you 
were unlucky and I forgot it in a webrev and fixed it in another? I 
don't know but it's been there and I checked, it is here:


http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html

I double checked that tlab_end_offset no longer appears in any 
architecture (as far as I can tell :)).


Thanks for testing and let me know if you run into any other issues!

Jc

On Fri, Mar 30, 2018 at 4:24 PM White, Derek <mailto:derek.wh...@cavium.com>> wrote:


Hi Jc,

I’ve been having trouble getting your patch to apply correctly. I
may have based it on the wrong version.

In any case, I think there’s a missing update to
macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(),
where “JavaThread::tlab_end_offset()” should become
“JavaThread::tlab_current_end_offset()”.

This should correspond to the other port’s changes in
templateTable_.cpp files.

Thanks!
- Derek

*From:* hotspot-compiler-dev
[mailto:hotspot-compiler-dev-boun...@openjdk.java.net
<mailto:hotspot-compiler-dev-boun...@openjdk.java.net>] *On Behalf
Of *JC Beyler
*Sent:* Wednesday, March 28, 2018 11:43 AM
*To:* Erik Österlund mailto:erik.osterl...@oracle.com>>
*Cc:* serviceability-dev@openjdk.java.net
<mailto:serviceability-dev@openjdk.java.net>; hotspot-compiler-dev
mailto:hotspot-compiler-...@openjdk.java.net>>
    *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling

Hi all,

I've been working on deflaking the tests mostly and the wording in
the JVMTI spec.

Here is the two incremental webrevs:

http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/

http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/

Here is the total webrev:

http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/

Here are the notes of this change:

   - Currently the tests pass 100 times in a row, I am working on
checking if they pass 1000 times in a row.

   - The default sampling rate is set to 512k, this is what we use
internally and having a default means that to enable the sampling
with the default, the user only has to do a enable event/disable
event via JVMTI (instead of enable + set sample rate).

   - I deprecated the code that was handling the fast path tlab
refill if it happened since this is now deprecated

       - Though I saw that Graal is still using it so I have to see
what needs to be done there exactly

Finally, using the Dacapo benchmark suite, I noted a 1% overhead for
when the event system is turned on and the callback to the native
agent is just empty. I got a 3% overhead with a 512k sampling rate
with the code I put in the native side of my tests.

Thanks and comments are appreciated,

Jc

On Mon, Mar 19, 2018 at 2:06 PM JC Beyler mailto:jcbey...@google.com>> wrote:

Hi all,

The incremental webrev update is here:

http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/

The full webrev is here:

http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/

Major change here is:

   - I've removed the heapMonitoring.cpp code in favor of just
having the sampling events as per Serguei's request; I still
have to do some overhead measurements but the tests prove the
concept can work

        - Most of the tlab code is unchanged, the only major
part is that now things get sent off to event collectors when
used and enabled.

   - Added the interpreter collectors to handle interpreter
execution

   - Updated the name from SetTlabHeapSampling to
SetHeapSampling to be more generic

   - Added a mutex for the thread sampling so that we can

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-04-02 Thread JC Beyler
Hi Derek,

I know there were a few things that went in that provoked a merge conflict.
I worked on it and got it up to date. Sadly my lack of knowledge makes it a
full rebase instead of keeping all the history. However, with a newly
cloned jdk/hs you should now be able to use:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/

The change you are referring to was done with the others so perhaps you
were unlucky and I forgot it in a webrev and fixed it in another? I don't
know but it's been there and I checked, it is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html

I double checked that tlab_end_offset no longer appears in any architecture
(as far as I can tell :)).

Thanks for testing and let me know if you run into any other issues!
Jc


On Fri, Mar 30, 2018 at 4:24 PM White, Derek  wrote:

> Hi Jc,
>
>
>
> I’ve been having trouble getting your patch to apply correctly. I may have
> based it on the wrong version.
>
>
>
> In any case, I think there’s a missing update to
> macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where
> “JavaThread::tlab_end_offset()” should become
> “JavaThread::tlab_current_end_offset()”.
>
>
>
> This should correspond to the other port’s changes in
> templateTable_.cpp files.
>
>
>
> Thanks!
> - Derek
>
>
>
> *From:* hotspot-compiler-dev [mailto:
> hotspot-compiler-dev-boun...@openjdk.java.net] *On Behalf Of *JC Beyler
> *Sent:* Wednesday, March 28, 2018 11:43 AM
> *To:* Erik Österlund 
> *Cc:* serviceability-dev@openjdk.java.net; hotspot-compiler-dev <
> hotspot-compiler-...@openjdk.java.net>
> *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling
>
>
>
> Hi all,
>
>
>
> I've been working on deflaking the tests mostly and the wording in the
> JVMTI spec.
>
>
>
> Here is the two incremental webrevs:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/
>
>
>
> Here is the total webrev:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/
>
>
>
> Here are the notes of this change:
>
>   - Currently the tests pass 100 times in a row, I am working on checking
> if they pass 1000 times in a row.
>
>   - The default sampling rate is set to 512k, this is what we use
> internally and having a default means that to enable the sampling with the
> default, the user only has to do a enable event/disable event via JVMTI
> (instead of enable + set sample rate).
>
>   - I deprecated the code that was handling the fast path tlab refill if
> it happened since this is now deprecated
>
>   - Though I saw that Graal is still using it so I have to see what
> needs to be done there exactly
>
>
>
> Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when
> the event system is turned on and the callback to the native agent is just
> empty. I got a 3% overhead with a 512k sampling rate with the code I put in
> the native side of my tests.
>
>
>
> Thanks and comments are appreciated,
>
> Jc
>
>
>
>
>
> On Mon, Mar 19, 2018 at 2:06 PM JC Beyler  wrote:
>
> Hi all,
>
>
>
> The incremental webrev update is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/
>
>
>
> The full webrev is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/
>
>
>
> Major change here is:
>
>   - I've removed the heapMonitoring.cpp code in favor of just having the
> sampling events as per Serguei's request; I still have to do some overhead
> measurements but the tests prove the concept can work
>
>- Most of the tlab code is unchanged, the only major part is that
> now things get sent off to event collectors when used and enabled.
>
>   - Added the interpreter collectors to handle interpreter execution
>
>   - Updated the name from SetTlabHeapSampling to SetHeapSampling to be
> more generic
>
>   - Added a mutex for the thread sampling so that we can initialize an
> internal static array safely
>
>   - Ported the tests from the old system to this new one
>
>
>
> I've also updated the JEP and CSR to reflect these changes:
>
>  https://bugs.openjdk.java.net/browse/JDK-8194905
>
>  https://bugs.openjdk.java.net/browse/JDK-8171119
>
>
>
> In order to make this have some forward progress, I've removed the heap
> sampling code entirely and now rely entirely on the event sampling system.
> The tests reflect this by using a simplified implementation of what an
> agent could do:
>
>
> http://cr.openjdk.java.net/~jcbe

RE: JDK-8171119: Low-Overhead Heap Profiling

2018-03-30 Thread White, Derek
Hi Jc,

I’ve been having trouble getting your patch to apply correctly. I may have 
based it on the wrong version.

In any case, I think there’s a missing update to macroAssembler_aarch64.cpp, in 
MacroAssembler::tlab_allocate(), where “JavaThread::tlab_end_offset()” should 
become “JavaThread::tlab_current_end_offset()”.

This should correspond to the other port’s changes in templateTable_.cpp 
files.

Thanks!
- Derek

From: hotspot-compiler-dev 
[mailto:hotspot-compiler-dev-boun...@openjdk.java.net] On Behalf Of JC Beyler
Sent: Wednesday, March 28, 2018 11:43 AM
To: Erik Österlund 
Cc: serviceability-dev@openjdk.java.net; hotspot-compiler-dev 

Subject: Re: JDK-8171119: Low-Overhead Heap Profiling

Hi all,

I've been working on deflaking the tests mostly and the wording in the JVMTI 
spec.

Here is the two incremental webrevs:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/

Here is the total webrev:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/

Here are the notes of this change:
  - Currently the tests pass 100 times in a row, I am working on checking if 
they pass 1000 times in a row.
  - The default sampling rate is set to 512k, this is what we use internally 
and having a default means that to enable the sampling with the default, the 
user only has to do a enable event/disable event via JVMTI (instead of enable + 
set sample rate).
  - I deprecated the code that was handling the fast path tlab refill if it 
happened since this is now deprecated
  - Though I saw that Graal is still using it so I have to see what needs 
to be done there exactly

Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when the 
event system is turned on and the callback to the native agent is just empty. I 
got a 3% overhead with a 512k sampling rate with the code I put in the native 
side of my tests.

Thanks and comments are appreciated,
Jc


On Mon, Mar 19, 2018 at 2:06 PM JC Beyler 
mailto:jcbey...@google.com>> wrote:
Hi all,

The incremental webrev update is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/

The full webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/

Major change here is:
  - I've removed the heapMonitoring.cpp code in favor of just having the 
sampling events as per Serguei's request; I still have to do some overhead 
measurements but the tests prove the concept can work
   - Most of the tlab code is unchanged, the only major part is that now 
things get sent off to event collectors when used and enabled.
  - Added the interpreter collectors to handle interpreter execution
  - Updated the name from SetTlabHeapSampling to SetHeapSampling to be more 
generic
  - Added a mutex for the thread sampling so that we can initialize an internal 
static array safely
  - Ported the tests from the old system to this new one

I've also updated the JEP and CSR to reflect these changes:
 https://bugs.openjdk.java.net/browse/JDK-8194905
 https://bugs.openjdk.java.net/browse/JDK-8171119

In order to make this have some forward progress, I've removed the heap 
sampling code entirely and now rely entirely on the event sampling system. The 
tests reflect this by using a simplified implementation of what an agent could 
do:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
(Search for anything mentioning event_storage).

I have not taken the time to port the whole code we had originally in 
heapMonitoring to this. I hesitate only because that code was in C++, I'd have 
to port it to C and this is for tests so perhaps what I have now is good enough?

As far as testing goes, I've ported all the relevant tests and then added a few:
   - Turning the system on/off
   - Testing using various GCs
   - Testing using the interpreter
   - Testing the sampling rate
   - Testing with objects and arrays
   - Testing with various threads

Finally, as overhead goes, I have the numbers of the system off vs a clean 
build and I have 0% overhead, which is what we'd want. This was using the 
Dacapo benchmarks. I am now preparing to run a version with the events on using 
dacapo and will report back here.

Any comments are welcome :)
Jc



On Thu, Mar 8, 2018 at 4:00 PM JC Beyler 
mailto:jcbey...@google.com>> wrote:
Hi all,

I apologize for the delay but I wanted to add an event system and that took a 
bit longer than expected and I also reworked the code to take into account the 
deprecation of FastTLABRefill.

This update has four parts:

A) I moved the implementation from Thread to ThreadHeapSampler inside of 
Thread. Would you prefer it as a pointer inside of Thread or like this works 
for you? Second question would be would you rather have an association outside 
of Thread altogether that tries to remember when threads are live and t

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-02-14 Thread Erik Österlund

Hi JC,

Comments are inlined below.

On 2018-02-13 06:18, JC Beyler wrote:

Hi Erik,

Thanks for your answers, I've now inlined my own answers/comments.

I've done a new webrev here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ 



The incremental is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ 



Note to all:
  - I've been integrating changes from Erin/Serguei/David comments so 
this webrev incremental is a bit an answer to all comments in one. I 
apologize for that :)



On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund 
mailto:erik.osterl...@oracle.com>> wrote:


Hi JC,

Sorry for the delayed reply.

Inlined answers:


On 2018-02-06 00:04, JC Beyler wrote:

Hi Erik,

(Renaming this to be folded into the newly renamed thread :))

First off, thanks a lot for reviewing the webrev! I appreciate it!

I updated the webrev to:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/


And the incremental one is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/


It contains:
- The change for since from 9 to 11 for the jvmti.xml
- The use of the OrderAccess for initialized
- Clearing the oop

I also have inlined my answers to your comments. The biggest
question
will come from the multiple *_end variables. A bit of the
logic there
is due to handling the slow path refill vs fast path refill and
checking that the rug was not pulled underneath the slowpath. I
believe that a previous comment was that TlabFastRefill was
going to
be deprecated.

If this is true, we could revert this code a bit and just do a
: if
TlabFastRefill is enabled, disable this. And then deprecate
that when
TlabFastRefill is deprecated.

This might simplify this webrev and I can work on a follow-up that
either: removes TlabFastRefill if Robbin does not have the
time to do
it or add the support to the assembly side to handle this
correctly.
What do you think?


I support removing TlabFastRefill, but I think it is good to not
depend on that happening first.



I'm slowly pushing on the FastTLABRefill 
(https://bugs.openjdk.java.net/browse/JDK-8194084 
), I agree on 
keeping both separate for now though so that we can think of both 
differently


Now, below, inlined are my answers:

On Fri, Feb 2, 2018 at 8:44 AM, Erik Österlund
mailto:erik.osterl...@oracle.com>>
wrote:

Hi JC,

Hope I am reviewing the right version of your work. Here
goes...

src/hotspot/share/gc/shared/collectedHeap.inline.hpp:

  159 AllocTracer::send_allocation_outside_tlab(klass,
result, size *
HeapWordSize, THREAD);
  160
  161 THREAD->tlab().handle_sample(THREAD, result, size);
  162 return result;
  163   }

Should not call tlab()->X without checking if (UseTLAB) IMO.

Done!


More about this later.



src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:

So first of all, there seems to quite a few ends. There is
an "end", a "hard
end", a "slow path end", and an "actual end". Moreover, it
seems like the
"hard end" is actually further away than the "actual end".
So the "hard end"
seems like more of a "really definitely actual end" or
something. I don't
know about you, but I think it looks kind of messy. In
particular, I don't
feel like the name "actual end" reflects what it
represents, especially when
there is another end that is behind the "actual end".

  413 HeapWord* ThreadLocalAllocBuffer::hard_end() {
  414   // Did a fast TLAB refill occur?
  415   if (_slow_path_end != _end) {
  416 // Fix up the actual end to be now the end of
this TLAB.
  417 _slow_path_end = _end;
  418 _actual_end = _end;
  419   }
  420
  421   return _actual_end + alignment_reserve();
  422 }

I really do not like making getters unexpectedly have
these kind of side
effects. It is not expected that when you ask for the
"hard end", you
implicitly update the "slow path end" and "actual end" to
new values

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-02-14 Thread Robbin Ehn

Hi JC,

Adding Markus Grönlund.

On 2018-02-14 01:11, JC Beyler wrote:

Hi all,

Just to show a bit how to solve the one issue Erik was referring to, consider 
the following webrev:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08a/

and incremental is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08_08a/


We have long standing issue of adding stuff directly to the thread.
Please put tracing stuff into a separate data structure and file.
Hoping that this new structure can contain this and thread specific parts of up 
coming tracing framework. (Markus?)

Maybe follow threadSMR, which is contained in:
open/src/hotspot/share/runtime/threadSMR.cpp
open/src/hotspot/share/runtime/threadSMR.inline.hpp
open/src/hotspot/share/runtime/threadSMR.hpp

And create threadTracing.XXX or similar ?

Thanks, Robbin



This puts the sampling bytes left in the Thread class () and then the code goes 
to Thread to sample or not. The advantage of this is that it is probably 
simpler to understand and follow what is going on, there is less of internal 
tlab magic going on and the outside of tlab allocations go through the thread 
instance the same way the TLAB allocations do.

I think it's cleaner but what do you think?

Thanks!
Jc

On Mon, Feb 12, 2018 at 9:18 PM, JC Beyler mailto:jcbey...@google.com>> wrote:

Hi Erik,

Thanks for your answers, I've now inlined my own answers/comments.

I've done a new webrev here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ 


The incremental is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ 


Note to all:
   - I've been integrating changes from Erin/Serguei/David comments so this 
webrev incremental is a bit an answer to all comments in one. I apologize for 
that :)


On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund mailto:erik.osterl...@oracle.com>> wrote:

Hi JC,

Sorry for the delayed reply.

Inlined answers:


On 2018-02-06 00:04, JC Beyler wrote:

Hi Erik,

(Renaming this to be folded into the newly renamed thread :))

First off, thanks a lot for reviewing the webrev! I appreciate it!

I updated the webrev to:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ 


And the incremental one is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ 


It contains:
- The change for since from 9 to 11 for the jvmti.xml
- The use of the OrderAccess for initialized
- Clearing the oop

I also have inlined my answers to your comments. The biggest 
question
will come from the multiple *_end variables. A bit of the logic 
there
is due to handling the slow path refill vs fast path refill and
checking that the rug was not pulled underneath the slowpath. I
believe that a previous comment was that TlabFastRefill was going to
be deprecated.

If this is true, we could revert this code a bit and just do a : if
TlabFastRefill is enabled, disable this. And then deprecate that 
when
TlabFastRefill is deprecated.

This might simplify this webrev and I can work on a follow-up that
either: removes TlabFastRefill if Robbin does not have the time to 
do
it or add the support to the assembly side to handle this correctly.
What do you think?


I support removing TlabFastRefill, but I think it is good to not depend 
on that happening first.



I'm slowly pushing on the FastTLABRefill 
(https://bugs.openjdk.java.net/browse/JDK-8194084 
), I agree on keeping both 
separate for now though so that we can think of both differently

Now, below, inlined are my answers:

On Fri, Feb 2, 2018 at 8:44 AM, Erik Österlund
mailto:erik.osterl...@oracle.com>> 
wrote:

Hi JC,

Hope I am reviewing the right version of your work. Here goes...

src/hotspot/share/gc/shared/collectedHeap.inline.hpp:

   159     AllocTracer::send_allocation_outside_tlab(klass, 
result, size *
HeapWordSize, THREAD);
   160
   161     THREAD->tlab().handle_sample(THREAD, result, size);
   162     return result;
   163   }

Should not call tlab()->X without checking if (UseTLAB) IMO.

Done!


More about this later.



src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:

So first of all, there seems to quite 

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-02-12 Thread Erik Österlund

Hi JC,

Sorry for the delayed reply.

Inlined answers:

On 2018-02-06 00:04, JC Beyler wrote:

Hi Erik,

(Renaming this to be folded into the newly renamed thread :))

First off, thanks a lot for reviewing the webrev! I appreciate it!

I updated the webrev to:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/

And the incremental one is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/

It contains:
- The change for since from 9 to 11 for the jvmti.xml
- The use of the OrderAccess for initialized
- Clearing the oop

I also have inlined my answers to your comments. The biggest question
will come from the multiple *_end variables. A bit of the logic there
is due to handling the slow path refill vs fast path refill and
checking that the rug was not pulled underneath the slowpath. I
believe that a previous comment was that TlabFastRefill was going to
be deprecated.

If this is true, we could revert this code a bit and just do a : if
TlabFastRefill is enabled, disable this. And then deprecate that when
TlabFastRefill is deprecated.

This might simplify this webrev and I can work on a follow-up that
either: removes TlabFastRefill if Robbin does not have the time to do
it or add the support to the assembly side to handle this correctly.
What do you think?


I support removing TlabFastRefill, but I think it is good to not depend 
on that happening first.



Now, below, inlined are my answers:

On Fri, Feb 2, 2018 at 8:44 AM, Erik Österlund
 wrote:

Hi JC,

Hope I am reviewing the right version of your work. Here goes...

src/hotspot/share/gc/shared/collectedHeap.inline.hpp:

  159 AllocTracer::send_allocation_outside_tlab(klass, result, size *
HeapWordSize, THREAD);
  160
  161 THREAD->tlab().handle_sample(THREAD, result, size);
  162 return result;
  163   }

Should not call tlab()->X without checking if (UseTLAB) IMO.


Done!


More about this later.




src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:

So first of all, there seems to quite a few ends. There is an "end", a "hard
end", a "slow path end", and an "actual end". Moreover, it seems like the
"hard end" is actually further away than the "actual end". So the "hard end"
seems like more of a "really definitely actual end" or something. I don't
know about you, but I think it looks kind of messy. In particular, I don't
feel like the name "actual end" reflects what it represents, especially when
there is another end that is behind the "actual end".

  413 HeapWord* ThreadLocalAllocBuffer::hard_end() {
  414   // Did a fast TLAB refill occur?
  415   if (_slow_path_end != _end) {
  416 // Fix up the actual end to be now the end of this TLAB.
  417 _slow_path_end = _end;
  418 _actual_end = _end;
  419   }
  420
  421   return _actual_end + alignment_reserve();
  422 }

I really do not like making getters unexpectedly have these kind of side
effects. It is not expected that when you ask for the "hard end", you
implicitly update the "slow path end" and "actual end" to new values.


As I said, a lot of this is due to the FastTlabRefill. If I make this
not supporting FastTlabRefill, this goes away. The reason the system
needs to update itself at the get is that you only know at that get if
things have shifted underneath the tlab slow path. I am not sure of
really better names (naming is hard!), perhaps we could do these
names:

- current_tlab_end   // Either the allocated tlab end or a sampling point
- last_allocation_address  // The end of the tlab allocation
- last_slowpath_allocated_end  // In case a fast refill occurred the
end might have changed, this is to remember slow vs fast past refills

the hard_end method can be renamed to something like:
tlab_end_pointer()// The end of the lab including a bit of
alignment reserved bytes


Those names sound better to me. Could you please provide a mapping from 
the old names to the new names so I understand which one is which please?


This is my current guess of what you are proposing:

end -> current_tlab_end
actual_end -> last_allocation_address
slow_path_end -> last_slowpath_allocated_end
hard_end -> tlab_end_pointer

I would prefer this naming:

end -> slow_path_end // the end for taking a slow path; either due to 
sampling or refilling

actual_end -> allocation_end // the end for allocations
slow_path_end -> last_slow_path_end // last address for slow_path_end 
(as opposed to allocation_end)

hard_end -> reserved_end // the end of the reserved space of the TLAB

About setting things in the getter... that still seems like a very 
unpleasant thing to me. It would be better to inspect the call hierarchy 
and explicitly update the ends where they need updating, and assert in 
the getter that they are in sync, rather than implicitly setting various 
ends as a surprising side effect in a getter. It looks like the call 
hierarchy is very small. With my new naming convention, reserved_end() 
would presumably return _allocation_end + alignment_reserve(), a

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-02-05 Thread David Holmes

Hi Jc,

I've just been browsing this and have a few comments/queries

src/hotspot/share/prims/jvmtiHeapTransition.hpp

In HeapThreadTransition what are the possible entry states? The primary 
state is presumably _in_native, but what else is expected? You can't 
transition from arbitrary states to _in_vm


Also in that file the include guard has the wrong name:

 #ifndef SHARE_VM_PRIMS_JVMTIHEAPSAMPLING_HPP

etc.

---

src/hotspot/share/runtime/heapMonitoring.cpp

Can you change StackTraceData from a struct to a class please.

StackTraceData::free_data could still be an instance method even if not 
done in the destructor. I also think the caller should do the delete 
rather than hiding it in here, as this obscures the lifecycle of the 
instances.


 215   void initialize(int max_storage) {
 216 MutexLocker mu(HeapMonitorStorage_lock);
 217 allocate_storage(max_storage);
 218 memset(&_stats, 0, sizeof(_stats));
 219   }

You're using a lock so obviously expect multiple threads to try this, 
but you aren't checking if initialization has already been done. That 
check is inside allocate_storage, but you're doing the memset 
unconditionally.


You're inconsistent with accessing/modifying _initialized under the 
HeapMonitorStorage_lock. The destructor calls free_storage with no lock 
(obviously) held. It's unclear how this lock-free destructor fits into 
the concurrent usage expected of this class. I see from your reply to 
Erik that you've now added some orderAccess usage to be "paranoid". But 
we don't want paranoid, we want correct. If there is expected to be a 
lock-free, but potentially concurrent path, then you will need to use 
OrderAccess appropriately. If everything is actually being done under a 
lock, then you don't need to use OrderAccess. But you need to be clear 
on exactly how concurrency comes into play with your code.


 397   _allocated_traces = new (ResourceObj::C_HEAP, mtInternal)
 398   GrowableArray(128, true);

These are allocations that will abort the VM if they fail - correct? If 
so that seems rather user-unfriendly for a profiling tool!


 422 void StackTraceStorage::weak_oops_do(BoolObjectClosure *is_alive,
 423  OopClosure *f) {
 424   MutexLocker mu(HeapMonitorStorage_lock);

IIUC oops-do methods get called at safepoints, but you are then taking a 
Mutex unconditionally. So any other code that acquires the same mutex 
must be guaranteed not to hit a safepoint check - so NoSafepointVerifier 
should be used to check that.


 575 void StackTraceStorage::store_garbage_trace(const StackTraceData 
&trace) {

 576   StackTraceData *new_trace = new StackTraceData();
 577   *new_trace = trace;

This looks quite odd to me. Shouldn't this just be using a copy-constructor?

Thanks,
David
-

On 27/01/2018 2:01 PM, JC Beyler wrote:

(Change of subject to contain the bug number :)).

New webrev is here:
Incremental:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03_04/

Full webrev:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04/

Inlined are my answers again :).

On Fri, Jan 26, 2018 at 5:51 AM, Robbin Ehn  wrote:

Hi JC, (dropping compiler on mail thread)

On 01/26/2018 06:45 AM, JC Beyler wrote:


Thanks Robbin for the reviews :)

The new full webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/
The incremental webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/



Thanks!

I got this compile issue:
/home/rehn/source/jdk/small/open/src/hotspot/share/gc/shared/threadLocalAllocBuffer.hpp:
In static member function 'static void
G1ResManTLABCache::put(ThreadLocalAllocBuffer&, AllocationContext_t)':
/home/rehn/source/jdk/small/open/src/hotspot/share/gc/shared/threadLocalAllocBuffer.hpp:97:13:
error: 'HeapWord* ThreadLocalAllocBuffer::hard_end()' is private
HeapWord* hard_end();
  ^
/home/rehn/source/jdk/small/closed/src/hotspot/share/gc/g1/g1ResManTLABCache.cpp:52:50:
error: within this context
size_t remaining = pointer_delta(tlab.hard_end(), tlab.top());


This is strange but I'm assuming it is because we are not working on
the same repo?

I used:
hg clone http://hg.openjdk.java.net/jdk/hs jdkhs-heap

I'll try a new clone on Monday and see. My new version moved hard_end
back to public so it should work now.







I inlined my answers:

On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn  wrote:


Hi JC, great to see another revision!


heapMonitoring.cpp

StackTraceData should not contain the oop for 'safety' reasons.
When StackTraceData is moved from _allocated_traces:
L452 store_garbage_trace(trace);
it contains a dead oop.
_allocated_traces could instead be a tupel of oop and StackTraceData thus
dead oops are not kept.



Done I used inheritance to make the copier work regardless but the
idea is the same.



Looks good.





You should use the new Access API for loading the oop, something like
this:
RootAccess::load(...)
I don't think you need to use Access API for cl

Re: JDK-8171119: Low-Overhead Heap Profiling

2018-01-30 Thread Robbin Ehn

Hi JC,

On 01/30/2018 04:22 AM, JC Beyler wrote:

- Collectedheap still needs to call AllocTracer to see if it is to be
sampled, I can't hide everything in it without a bigger refactor (want
me to try?)


Yes we need a bigger refactor to do this nicely.
I suggested not doing that now, so just rollback to the previously version.

Thanks for having a look at it!

/Robbin



On Mon, Jan 29, 2018 at 1:29 AM, Robbin Ehn  wrote:

Hi JC, thanks!

I'm happy with current state, looks good!

Truncated:

On 01/27/2018 05:01 AM, JC Beyler wrote:


This is strange but I'm assuming it is because we are not working on
the same repo?

I used:
hg clone http://hg.openjdk.java.net/jdk/hs jdkhs-heap

I'll try a new clone on Monday and see. My new version moved hard_end
back to public so it should work now.



Sorry this compile error was in closed code.
Now the closed part compiles, thanks.



Fair enough, hopefully Thomas will chime in. Are you saying that this
first version could go in and we can work on a refinement? Or are you
saying I should work on this now at the same time and fix it before
this V1 goes in? (Just so I know :))



We may have to change this before integration, but for now keep it as is.


I'll look at this on Monday then!



Great!

/Robbin




Thanks for the reply and have a great weekend!
Jc







Minor nit, when declaring pointer there is a little mix of having the
pointer adjacent by type name and data name. (Most hotspot code is by
type
name)
E.g.
heapMonitoring.cpp:711 jvmtiStackTrace *trace = 
heapMonitoring.cpp:733 Method* m = vfst.method();
(not just this file)



Done!



HeapMonitorThreadOnOffTest.java:77
I would make g_tmp volatile, otherwise the assignment in loop may
theoretical be skipped.



Also done!




Looks good, thanks!

/Robbin



Thanks again!
Jc







Re: JDK-8171119: Low-Overhead Heap Profiling

2018-01-29 Thread JC Beyler
Hi Robbin,

So I did the changes to move most of the code into the AllocTracer and
you can see it incrementally here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05/
And the full webrev here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05/

Now the issues I see here:

- AllocTracer now does "things" instead of just having a send_*_event
API, that is a change in concept for that class
- Collectedheap still needs to call AllocTracer to see if it is to be
sampled, I can't hide everything in it without a bigger refactor (want
me to try?)
- The ordering is now important: AllocTracer has to get called before
tlab().fill
   - Otherwise the fact that the allocation has to get sampled will
get lost when the tlab gets replaced

If this still looks better to you or in a better direction, let me
know. I can do the end part of it and add an event for a sample since
now that is easy and makes sense to probably add such an event, and I
can add a few more tests.

Thanks!

On Mon, Jan 29, 2018 at 1:29 AM, Robbin Ehn  wrote:
> Hi JC, thanks!
>
> I'm happy with current state, looks good!
>
> Truncated:
>
> On 01/27/2018 05:01 AM, JC Beyler wrote:
>>
>> This is strange but I'm assuming it is because we are not working on
>> the same repo?
>>
>> I used:
>> hg clone http://hg.openjdk.java.net/jdk/hs jdkhs-heap
>>
>> I'll try a new clone on Monday and see. My new version moved hard_end
>> back to public so it should work now.
>
>
> Sorry this compile error was in closed code.
> Now the closed part compiles, thanks.
>
>>
>> Fair enough, hopefully Thomas will chime in. Are you saying that this
>> first version could go in and we can work on a refinement? Or are you
>> saying I should work on this now at the same time and fix it before
>> this V1 goes in? (Just so I know :))
>
>
> We may have to change this before integration, but for now keep it as is.
>
>> I'll look at this on Monday then!
>
>
> Great!
>
> /Robbin
>
>
>>
>> Thanks for the reply and have a great weekend!
>> Jc
>>
>>>

> 
> Minor nit, when declaring pointer there is a little mix of having the
> pointer adjacent by type name and data name. (Most hotspot code is by
> type
> name)
> E.g.
> heapMonitoring.cpp:711 jvmtiStackTrace *trace = 
> heapMonitoring.cpp:733 Method* m = vfst.method();
> (not just this file)
>

 Done!

> 
> HeapMonitorThreadOnOffTest.java:77
> I would make g_tmp volatile, otherwise the assignment in loop may
> theoretical be skipped.
>

 Also done!
>>>
>>>
>>>
>>> Looks good, thanks!
>>>
>>> /Robbin
>>>

 Thanks again!
 Jc

>>>
>


Re: JDK-8171119: Low-Overhead Heap Profiling

2018-01-29 Thread Robbin Ehn

Hi JC, thanks!

I'm happy with current state, looks good!

Truncated:

On 01/27/2018 05:01 AM, JC Beyler wrote:

This is strange but I'm assuming it is because we are not working on
the same repo?

I used:
hg clone http://hg.openjdk.java.net/jdk/hs jdkhs-heap

I'll try a new clone on Monday and see. My new version moved hard_end
back to public so it should work now.


Sorry this compile error was in closed code.
Now the closed part compiles, thanks.



Fair enough, hopefully Thomas will chime in. Are you saying that this
first version could go in and we can work on a refinement? Or are you
saying I should work on this now at the same time and fix it before
this V1 goes in? (Just so I know :))


We may have to change this before integration, but for now keep it as is.


I'll look at this on Monday then!


Great!

/Robbin



Thanks for the reply and have a great weekend!
Jc







Minor nit, when declaring pointer there is a little mix of having the
pointer adjacent by type name and data name. (Most hotspot code is by
type
name)
E.g.
heapMonitoring.cpp:711 jvmtiStackTrace *trace = 
heapMonitoring.cpp:733 Method* m = vfst.method();
(not just this file)



Done!



HeapMonitorThreadOnOffTest.java:77
I would make g_tmp volatile, otherwise the assignment in loop may
theoretical be skipped.



Also done!



Looks good, thanks!

/Robbin



Thanks again!
Jc