Dear all,
Thread-safety is back!! Here is the update webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
Full webrev is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
In order to really test this, I needed to add this so thought now was a good time. It required a few changes here for the creation to ensure correctness and safety. Now we
keep the static pointer but clear the data internally so on re-initialize, it will be a bit more costly than before. I don't think this is a huge use-case so I did not
think it was a problem. I used the internal MutexLocker, I think I used it well, let me know.
I also added three tests:
1) Stack depth test:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch
This test shows that the maximum stack depth system is working.
2) Thread safety:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch
The test creates 24 threads and they all allocate at the same time. The test
then checks it does find samples from all the threads.
3) Thread on/off safety
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch
The test creates 24 threads that all allocate a bunch of memory. Then another
thread turns the sampling on/off.
Btw, both tests 2 & 3 failed without the locks.
As I worked on this, I saw a lot of places where the tests are doing very similar things, I'm going to clean up the code a bit and make a HeapAllocator class that all tests
can call directly. This will greatly simplify the code.
Thanks for any comments/criticisms!
Jc
On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Dear all,
Small update to the webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
Full webrev is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
I updated a bit of the naming, removed a TODO comment, and I added a test
for testing the sampling rate. I also updated the maximum stack depth to 1024,
there is no
reason to keep it so small. I did a micro benchmark that tests the overhead
and it seems relatively the same.
I compared allocations from a stack depth of 10 and allocations from a
stack depth of 1024 (allocations are from the same helper method in
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>):
- For an array of 1 integer allocated in a loop; stack depth
1024 vs stack depth 10: 1% slower
- For an array of 200k integers allocated in a loop; stack depth
1024 vs stack depth 10: 3% slower
So basically now moving the maximum stack depth to 1024 but we only copy
over the stack depths actually used.
For the next webrev, I will be adding a stack depth test to show that it
works and probably put back the mutex locking so that we can see how difficult
it is to keep
thread safe.
Let me know what you think!
Jc
On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Forgot to say that for my numbers:
- Not in the test are the actual numbers I got for the various array
sizes, I ran the program 30 times and parsed the output; here are the averages
and standard
deviation:
1000: 1.28% average; 1.13% standard deviation
10000: 1.59% average; 1.25% standard deviation
100000: 1.26% average; 1.26% standard deviation
The 1000/10000/100000 are the sizes of the arrays being allocated.
These are allocated 100k times and the sampling rate is 111 times the size of
the array.
Thanks!
Jc
On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Hi all,
After a bit of a break, I am back working on this :). As before,
here are two webrevs:
- Full change set: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>
- Compared to version 8:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>
(This version is compared to version 8 I last showed but
ported to the new folder hierarchy)
In this version I have:
- Handled Thomas' comments from his email of 07/03:
- Merged the logging to be standard
- Fixed up the code a bit where asked
- Added some notes about the code not being thread-safe yet
- Removed additional dead code from the version that modifies
interpreter/c1/c2
- Fixed compiler issues so that it compiles with
--disable-precompiled-header
- Tested with ./configure --with-boot-jdk=<jdk8>
--with-debug-level=slowdebug --disable-precompiled-headers
Additionally, I added a test to check the sanity of the sampler:
HeapMonitorStatCorrectnessTest
(http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>)
- This allocates a number of arrays and checks that we obtain
the number of samples we want with an accepted error of 5%. I tested it 100
times and it
passed everytime, I can test more if wanted
- Not in the test are the actual numbers I got for the various
array sizes, I ran the program 30 times and parsed the output; here are the
averages and
standard deviation:
1000: 1.28% average; 1.13% standard deviation
10000: 1.59% average; 1.25% standard deviation
100000: 1.26% average; 1.26% standard deviation
What this means is that we were always at about 1~2% of the number
of samples the test expected.
Let me know what you think,
Jc
On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Hi all,
I apologize, I have not yet handled your remarks but thought
this new webrev would also be useful to see and comment on perhaps.
Here is the latest webrev, it is generated slightly different
than the others since now I'm using webrev.ksh without the -N option:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
And the webrev.07 to webrev.08 diff is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
(Let me know if it works well)
It's a small change between versions but it:
- provides a fix that makes the average sample rate correct
(more on that below).
- fixes the code to actually have it play nicely with the
fast tlab refill
- cleaned up a bit the JVMTI text and now use jvmtiFrameInfo
- moved the capability to be onload solo
With this webrev, I've done a small study of the random number
generator we use here for the sampling rate. I took a small program and it can
be simplified to:
for (outer loop)
for (inner loop)
int[] tmp = new int[arraySize];
- I've fixed the outer and inner loops to being 800 for this
experiment, meaning we allocate 640000 times an array of a given array size.
- Each program provides the average sample size used for the
whole execution
- Then, I ran each variation 30 times and then calculated the
average of the average sample size used for various array sizes. I selected the
array size to
be one of the following: 1, 10, 100, 1000.
- When compared to 512kb, the average sample size of 30 runs:
1: 4.62% of error
10: 3.09% of error
100: 0.36% of error
1000: 0.1% of error
10000: 0.03% of error
What it shows is that, depending on the number of samples, the
average does become better. This is because with an allocation of 1 element per
array, it
will take longer to hit one of the thresholds. This is seen by
looking at the sample count statistic I put in. For the same number of
iterations (800 *
800), the different array sizes provoke:
1: 62 samples
10: 125 samples
100: 788 samples
1000: 6166 samples
10000: 57721 samples
And of course, the more samples you have, the more sample rates
you pick, which means that your average gets closer using that math.
Thanks,
Jc
On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Thanks Robbin,
This seems to have worked. When I have the next webrev
ready, we will find out but I'm fairly confident it will work!
Thanks agian!
Jc
On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin....@oracle.com
<mailto:robbin....@oracle.com>> wrote:
Hi JC,
On 06/29/2017 12:15 AM, JC Beyler wrote:
B) Incremental changes
I guess the most common work flow here is using mq :
hg qnew fix_v1
edit files
hg qrefresh
hg qnew fix_v2
edit files
hg qrefresh
if you do hg log you will see 2 commits
webrev.ksh -r -2 -o my_inc_v1_v2
webrev.ksh -o my_full_v2
In your .hgrc you might need:
[extensions]
mq =
/Robbin
Again another newbiew question here...
For showing the incremental changes, is there a
link that explains how to do that? I apologize for my newbie questions all the
time :)
Right now, I do:
ksh ../webrev.ksh -m -N
That generates a webrev.zip and send it to Chuck
Rasbold. He then uploads it to a new webrev.
I tried commiting my change and adding a small
change. Then if I just do ksh ../webrev.ksh without any options, it seems to
produce a similar
page but now with only the changes I had (so the
06-07 comparison you were talking about) and a changeset that has it all. I
imagine that is
what you meant.
Which means that my workflow would become:
1) Make changes
2) Make a webrev without any options to show just
the differences with the tip
3) Amend my changes to my local commit so that I
have it done with
4) Go to 1
Does that seem correct to you?
Note that when I do this, I only see the full
change of a file in the full change set (Side note here: now the page says
change set and not
patch, which is maybe why Serguei was having
issues?).
Thanks!
Jc
On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn
<robbin....@oracle.com <mailto:robbin....@oracle.com>
<mailto:robbin....@oracle.com
<mailto:robbin....@oracle.com>>> wrote:
Hi,
On 06/28/2017 12:04 AM, JC Beyler wrote:
Dear Thomas et al,
Here is the newest webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>
You have some more bits to in there but
generally this looks good and really nice with more tests.
I'll do and deep dive and re-test this when I
get back from my long vacation with whatever patch version you have then.
Also I think it's time you provide incremental
(v06->07 changes) as well as complete change-sets.
Thanks, Robbin
Thomas, I "think" I have answered all your
remarks. The summary is:
- The statistic system is up and provides
insight on what the heap sampler is doing
- I've noticed that, though the
sampling rate is at the right mean, we are missing some samples, I have not yet
tracked out why
(details below)
- I've run a tiny benchmark that is the
worse case: it is a very tight loop and allocated a small array
- In this case, I see no overhead
when the system is off so that is a good start :)
- I see right now a high overhead in
this case when sampling is on. This is not a really too surprising but I'm
going to see if
this is consistent with our
internal implementation. The benchmark is
really allocation stressful so I'm not too surprised but I want to do the due
diligence.
- The statistic system up is up and I
have a new test
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>
- I did a bit of a study about the
random generator here, more details are below but basically it seems to work
well
- I added a capability but since this
is the first time doing this, I was not sure I did it right
- I did add a test though for it and
the test seems to do what I expect (all methods are failing with the
JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
-
http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
- I still need to figure out what to do
about the multi-agent vs single-agent issue
- As far as measurements, it seems I
still need to look at:
- Why we do the 20 random calls
first, are they necessary?
- Look at the mean of the sampling
rate that the random generator does and also what is actually sampled
- What is the overhead in terms of
memory/performance when on?
I have inlined my answers, I think I got
them all in the new webrev, let me know your thoughts.
Thanks again!
Jc
On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl
<thomas.scha...@oracle.com <mailto:thomas.scha...@oracle.com>
<mailto:thomas.scha...@oracle.com
<mailto:thomas.scha...@oracle.com>> <mailto:thomas.scha...@oracle.com
<mailto:thomas.scha...@oracle.com>
<mailto:thomas.scha...@oracle.com
<mailto:thomas.scha...@oracle.com>>>> wrote:
Hi,
On Wed, 2017-06-21 at 13:45 -0700, JC
Beyler wrote:
> Hi all,
>
> First off: Thanks again to Robbin
and Thomas for their reviews :)
>
> Next, I've uploaded a new webrev:
>
http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>
>
> Here is an update:
>
> - @Robbin, I forgot to say that yes
I need to look at implementing
> this for the other architectures
and testing it before it is all
> ready to go. Is it common to have
it working on all possible
> combinations or is there a subset
that I should be doing first and we
> can do the others later?
> - I've tested slowdebug, built and
ran the JTreg tests I wrote with
> slowdebug and fixed a few more
issues
> - I've refactored a bit of the code
following Thomas' comments
> - I think I've handled all the
comments from Thomas (I put
> comments inline below for the
specifics)
Thanks for handling all those.
> - Following Thomas' comments on
statistics, I want to add some
> quality assurance tests and find
that the easiest way would be to
> have a few counters of what is
happening in the sampler and expose
> that to the user.
> - I'll be adding that in the
next version if no one sees any
> objections to that.
> - This will allow me to add a
sanity test in JTreg about number of
> samples and average of sampling rate
>
> @Thomas: I had a few questions that
I inlined below but I will
> summarize the "bigger ones" here:
> - You mentioned constants are
not using the right conventions, I
> looked around and didn't see any
convention except normal naming then
> for static constants. Is that right?
I looked through
https://wiki.openjdk.java.net/display/HotSpot/StyleGui
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui
<https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>
de and the rule is to "follow an
existing pattern and must have a
distinct appearance from other
names". Which does not help a lot I
guess :/ The GC team started using
upper camel case, e.g.
SomeOtherConstant, but very likely
this is probably not applied
consistently throughout. So I am fine
with not adding another style
(like kMaxStackDepth with the "k" in
front with some unknown meaning)
is fine.
(Chances are you will find that style
somewhere used anyway too,
apologies if so :/)
Thanks for that link, now I know where to
look. I used the upper camel case in my code as well then :) I should have
gotten them all.
> PS: I've also inlined my answers
to Thomas below:
>
> On Tue, Jun 13, 2017 at 8:03 AM,
Thomas Schatzl <thomas.schatzl@oracl
> e.com <http://e.com> <http://e.com>
<http://e.com>> wrote:
> > Hi all,
> >
> > On Mon, 2017-06-12 at 11:11
-0700, JC Beyler wrote:
> > > Dear all,
> > >
> > > I've continued working on this
and have done the following
> > webrev:
> > >
http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
<http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>
> >
> > [...]
> > > Things I still need to do:
> > > - Have to fix that TLAB
case for the FastTLABRefill
> > > - Have to start looking at
the data to see that it is
> > consistent and does gather the
right samples, right frequency, etc.
> > > - Have to check the GC
elements and what that produces
> > > - Run a slowdebug run and
ensure I fixed all those issues you
> > saw > Robbin
> > >
> > > Thanks for looking at the
webrev and have a great week!
> >
> > scratching a bit on the
surface of this change, so apologies for
> > rather shallow comments:
> >
> > - macroAssembler_x86.cpp:5604:
while this is compiler code, and I
> > am not sure this is final,
please avoid littering the code with
> > TODO remarks :) They tend to be
candidates for later wtf moments
> > only.
> >
> > Just file a CR for that.
> >
> Newcomer question: what is a CR
and not sure I have the rights to do
> that yet ? :)
Apologies. CR is a change request,
this suggests to file a bug in the
bug tracker. And you are right, you
can't just create a new account in
the OpenJDK JIRA yourselves. :(
Ok good to know, I'll continue with my own
todo list but I'll work hard on not letting it slip in the webrevs anymore :)
I was mostly referring to the "... but it
is a TODO" part of that
comment in macroassembler_x86.cpp.
Comments about the why of the code
are appreciated.
[Note that I now understand that this
is to some degree still work in
progress. As long as the final
changeset does no contain TODO's I am
fine (and it's not a hard objection, rather
their use in "final" code
is typically limited in my
experience)]
5603 // Currently, if this happens,
just set back the actual end to
where it was.
5604 // We miss a chance to sample
here.
Would be okay, if explaining "this" and the
"why" of missing a chance
to sample here would be best.
Like maybe:
// If we needed to refill TLABs, just
set the actual end point to
// the end of the TLAB again. We do
not sample here although we could.
Done with your comment, it works well in
my mind.
I am not sure whether "miss a chance to
sample" meant "we could, but
consciously don't because it's not that
useful" or "it would be
necessary but don't because it's too
complicated to do.".
Looking at the original comment once
more, I am also not sure if that
comment shouldn't referring to the
"end" variable (not actual_end)
because that's the variable that is
responsible for taking the sampling
path? (Going from the member
description of ThreadLocalAllocBuffer).
I've moved this code and it no longer
shows up here but the rationale and answer was:
So.. Yes, end is the variable provoking
the sampling. Actual end is the actual end of the TLAB.
What was happening here is that the code
is resetting _end to point towards the end of the new TLAB. Because, we now
have the end for
sampling and _actual_end for
the actual end, we need to update the
actual_end as well.
Normally, were we to do the real work
here, we would calculate the (end - start) offset, then do:
- Set the new end to : start + (old_end -
old_start)
- Set the actual end like we do here now
where it because it is the actual end.
Why is this not done here now anymore?
- I was still debating which path to
take:
- Do it in the fast refill code, it
has its perks:
- In a world where fast refills
are happening all the time or a lot, we can augment there the code to do the
sampling
- Remember what we had as an end
before leaving the slowpath and check on return
- This is what I'm doing now,
it removes the need to go fix up all fast refill paths but if you remain in
fast refill paths,
you won't get sampling. I
have to think of the consequences of that,
maybe a future change later on?
- I have the statistics now
so I'm going to study that
-> By the way, though my
statistics are showing I'm missing some samples, if I turn off FastTlabRefill, it
is the same
loss so for now, it seems
this does not occur in my simple test.
But maybe I am only confused and it's
best to just leave the comment
away. :)
Thinking about it some more, doesn't
this not-sampling in this case
mean that sampling does not work in
any collector that does inline TLAB
allocation at the moment? (Or is
inline TLAB alloc automatically
disabled with sampling somehow?)
That would indeed be a bigger TODO
then :)
Agreed, this remark made me think that
perhaps as a first step the new way of doing it is better but I did have to:
- Remove the const of the
ThreadLocalBuffer remaining and hard_end methods
- Move hard_end out of the header file
to have a bit more logic there
Please let me know what you think of that
and if you prefer it this way or changing the fast refills. (I prefer this way
now because it
is more incremental).
> > - calling
HeapMonitoring::do_weak_oops() (which should probably be
> > called weak_oops_do() like other
similar methods) only if string
> > deduplication is enabled (in
g1CollectedHeap.cpp:4511) seems wrong.
>
> The call should be at least around
6 lines up outside the if.
>
> Preferentially in a method like
process_weak_jni_handles(), including
> additional logging. (No new (G1) gc
phase without minimal logging
> :)).
> Done but really not sure because:
>
> I put for logging:
> log_develop_trace(gc,
freelist)("G1ConcRegionFreeing [other] : heap
> monitoring");
I would think that "gc, ref" would be
more appropriate log tags for
this similar to jni handles.
(I am als not sure what weak
reference handling has to do with
G1ConcRegionFreeing, so I am a bit
puzzled)
I was not sure what to put for the tags or
really as the message. I cleaned it up a bit now to:
log_develop_trace(gc, ref)("HeapSampling
[other] : heap monitoring processing");
> Since weak_jni_handles didn't have
logging for me to be inspired
> from, I did that but unconvinced
this is what should be done.
The JNI handle processing does have
logging, but only in
ReferenceProcessor::process_discovered_references(). In
process_weak_jni_handles() only
overall time is measured (in a G1
specific way, since only G1 supports
disabling reference procesing) :/
The code in ReferenceProcessor prints
both time taken
referenceProcessor.cpp:254, as well
as the count, but strangely only in
debug VMs.
I have no idea why this logging is
that unimportant to only print that
in a debug VM. However there are
reviews out for changing this area a
bit, so it might be useful to wait
for that (JDK-8173335).
I cleaned it up a bit anyway and now it
returns the count of objects that are in the system.
> > - the change doubles the size of
> > CollectedHeap::allocate_from_tlab_slow()
above the "small and nice"
> > threshold. Maybe it could be
refactored a bit.
> Done I think, it looks better to me
:).
In
ThreadLocalAllocBuffer::handle_sample() I think the
set_back_actual_end()/pick_next_sample() calls could be hoisted out of
the "if" :)
Done!
> > - referenceProcessor.cpp:261: the
change should add logging about
> > the number of references
encountered, maybe after the corresponding
> > "JNI weak reference count" log
message.
> Just to double check, are you
saying that you'd like to have the heap
> sampler to keep in store how many
sampled objects were encountered in
> the HeapMonitoring::weak_oops_do?
> - Would a return of the method
with the number of handled
> references and logging that work?
Yes, it's fine if
HeapMonitoring::weak_oops_do() only returned the
number of processed weak oops.
Done also (but I admit I have not tested
the output yet) :)
> - Additionally, would you prefer
it in a separate block with its
> GCTraceTime?
Yes. Both kinds of information is
interesting: while the time taken is
typically more important, the next
question would be why, and the
number of references typically goes a
long way there.
See above though, it is probably best
to wait a bit.
Agreed that I "could" wait but, if it's
ok, I'll just refactor/remove this when we get closer to something final. Either,
JDK-8173335
has gone in and I will notice it now or it
will soon and I can change it then.
> > - threadLocalAllocBuffer.cpp:331: one more
"TODO"
> Removed it and added it to my
personal todos to look at.
> > >
> > - threadLocalAllocBuffer.hpp:
ThreadLocalAllocBuffer class
> > documentation should be updated
about the sampling additions. I
> > would have no clue what the difference
between "actual_end" and
> > "end" would be from the given
information.
> If you are talking about the
comments in this file, I made them more
> clear I hope in the new webrev. If
it was somewhere else, let me know
> where to change.
Thanks, that's much better. Maybe a
note in the comment of the class
that ThreadLocalBuffer provides some
sampling facility by modifying the
end() of the TLAB to cause "frequent"
calls into the runtime call where
actual sampling takes place.
Done, I think it's better now. Added
something about the slow_path_end as well.
> > - in heapMonitoring.hpp: there
are some random comments about some
> > code that has been grabbed from
"util/math/fastmath.[h|cc]". I
> > can't tell whether this is code
that can be used but I assume that
> > Noam Shazeer is okay with that
(i.e. that's all Google code).
> Jeremy and I double checked and we
can release that as I thought. I
> removed the comment from that piece
of code entirely.
Thanks.
> > - heapMonitoring.hpp/cpp static
constant naming does not correspond
> > to Hotspot's. Additionally, in
Hotspot static methods are cased
> > like other methods.
> I think I fixed the methods to be
cased the same way as all other
> methods. For static constants, I
was not sure. I fixed a few other
> variables but I could not seem to
really see a consistent trend for
> constants. I made them as variables
but I'm not sure now.
Sorry again, style is a kind of mess.
The goal of my suggestions here
is only to prevent yet another style
creeping in.
> > - in heapMonitoring.cpp there are
a few cryptic comments at the top
> > that seem to refer to internal
stuff that should probably be
> > removed.
> Sorry about that! My personal todos
not cleared out.
I am happy about comments, but I
simply did not understand any of that
and I do not know about other readers
as well.
If you think you will remember
removing/updating them until the review
proper (I misunderstood the review
situation a little it seems).
> > I did not think through the
impact of the TLAB changes on collector
> > behavior yet (if there are). Also
I did not check for problems with
> > concurrent mark and SATB/G1 (if
there are).
> I would love to know your thoughts
on this, I think this is fine. I
I think so too now. No objects are
made live out of thin air :)
> see issues with multiple threads
right now hitting the stack storage
> instance. Previous webrevs had a
mutex lock here but we took it out
> for simplificity (and only for now).
:) When looking at this after some
thinking I now assume for this
review that this code is not MT safe
at all. There seems to be more
synchronization missing than just the
one for the StackTraceStorage. So
no comments about this here.
I doubled checked a bit (quickly I admit)
but it seems that synchronization in StackTraceStorage is really all you need
(all methods
lead to a StackTraceStorage one
and can be multithreaded outside of that).
There is a question about the
initialization where the method HeapMonitoring::initialize_profiling is not
thread safe.
It would work (famous last words) and not
crash if there was a race but we could add a synchronization point there as
well (and
therefore on the stop as well).
But anyway I will really check and do this
once we add back synchronization.
Also, this would require some kind of
specification of what is allowed
to be called when and where.
Would we specify this with the methods in
the jvmti.xml file? We could start by specifying in each that they are not
thread safe but I
saw no mention of that for
other methods.
One potentially relevant observation
about locking here: depending on
sampling frequency,
StackTraceStore::add_trace() may be rather
frequently called. I assume that you
are going to do measurements :)
Though we don't have the TLAB
implementation in our code, the compiler generated sampler uses 2% of overhead
with a 512k sampling rate.
I can do real measurements
when the code settles and we can see how
costly this is as a TLAB implementation.
However, my theory is that if the rate is
512k, the memory/performance overhead should be minimal since it is what we saw
with our
code/workloads (though not called
the same way, we call it essentially at
the same rate).
If you have a benchmark you'd like me to
test, let me know!
Right now, with my really small test, this
does use a bit of overhead even for a 512k sample size. I don't know yet why,
I'm going to
see what is going on.
Finally, I think it is not reasonable to
suppose the overhead to be negligible if the sampling rate used is too low. The
user should
know that the lower the rate,
the higher the overhead (documentation
TODO?).
I am not sure what the expected usage
of the API is, but
StackTraceStore::add_trace() seems to
be able to grow without bounds.
Only a GC truncates them to the live
ones. That in itself seems to be
problematic (GCs can be *wide*
apart), and of course some of the API
methods add to that because they
duplicate that unbounded array. Do you
have any concerns/measurements about
this?
So, the theory is that yes add_trace can
be able to grow without bounds but it grows at a sample per 512k of allocated
space. The
stacks it gathers are currently
maxed at 64 (I'd like to expand that to an
option to the user though at some point). So I have no concerns because:
- If really this is taking a lot of space,
that means the job is keeping a lot of objects in memory as well, therefore the
entire heap
is getting huge
- If this is the case, you will be
triggering a GC at some point anyway.
(I'm putting under the rug the issue of "What
if we set the rate to 1 for example" because as you lower the sampling rate, we
cannot
guarantee low overhead; the
idea behind this feature is to have a
means of having meaningful allocated samples at a low overhead)
I have no measurements really right now
but since I now have some statistics I can poll, I will look a bit more at this
question.
I have the same last sentence than above:
the user should expect this to happen if the sampling rate is too small. That
probably can be
reflected in the
StartHeapSampling as a note : careful this
might impact your performance.
Also, these stack traces might hold
on to huge arrays. Any
consideration of that? Particularly
it might be the cause for OOMEs in
tight memory situations.
There is a stack size maximum that is set
to 64 so it should not hold huge arrays. I don't think this is an issue but I
can double
check with a test or two.
- please consider adding a safepoint
check in
HeapMonitoring::weak_oops_do to
prevent accidental misuse.
- in struct StackTraceStorage, the
public fields may also need
underscores. At least some files in
the runtime directory have structs
with underscored public members (and
some don't). The runtime team
should probably comment on that.
Agreed I did not know. I looked around and
a lot of structs did not have them it seemed so I left it as is. I will happily
change it if
someone prefers (I was not
sure if you really preferred or not, your
sentence seemed to be more a note of "this might need to change but I don't
know if the
runtime team enforces that", let
me know if I read that wrongly).
- In
StackTraceStorage::weak_oops_do(), when examining the
StackTraceData, maybe it is useful to
consider having a non-NULL
reference outside of the heap's
reserved space an error. There should
be no oop outside of the heap's
reserved space ever.
Unless you allow storing random
values in StackTraceData::obj, which I
would not encourage.
I suppose you are talking about this part:
if ((value != NULL &&
Universe::heap()->is_in_reserved(value)) &&
(is_alive == NULL ||
is_alive->do_object_b(value))) {
What you are saying is that I could have
something like:
if (value != my_non_null_reference &&
(is_alive == NULL ||
is_alive->do_object_b(value))) {
Is that what you meant? Is there really a
reason to do so? When I look at the code, is_in_reserved seems like a O(1)
method call. I'm
not even sure we can have a
NULL value to be honest. I might have to
study that to see if this was not a paranoid test to begin with.
The is_alive code has now morphed due to
the comment below.
- HeapMonitoring::weak_oops_do() does
not seem to use the
passed AbstractRefProcTaskExecutor.
It did use it:
size_t HeapMonitoring::weak_oops_do(
AbstractRefProcTaskExecutor
*task_executor,
BoolObjectClosure* is_alive,
OopClosure *f,
VoidClosure *complete_gc) {
assert(SafepointSynchronize::is_at_safepoint(),
"must be at safepoint");
if (task_executor != NULL) {
task_executor->set_single_threaded_mode();
}
return
StackTraceStorage::storage()->weak_oops_do(is_alive, f, complete_gc);
}
But due to the comment below, I refactored
this, so this is no longer here. Now I have an always true closure that is
passed.
- I do not understand allowing to
call this method with a NULL
complete_gc closure. This would mean
that objects referenced from the
object that is referenced by the
StackTraceData are not pulled, meaning
they would get stale.
- same with is_alive parameter value
of NULL
So these questions made me look a bit
closer at this code. This code I think was written this way to have a very
small impact on the
file but you are right, there
is no reason for this here. I've
simplified the code by making in referenceProcessor.cpp a process_HeapSampling
method that handles
everything there.
The code allowed NULLs because it depended
on where you were coming from and how the code was being called.
- I added a static always_true variable
and pass that now to be more consistent with the rest of the code.
- I moved the complete_gc into
process_phaseHeapSampling now (new method) and handle the task_executor and the
complete_gc there
- Newbie question: in our code we did
a set_single_threaded_mode but I see that process_phaseJNI does it right before
its call, do
I need to do it for the
process_phaseHeapSample?
That API is much cleaner (in my mind) and
is consistent with what is done around it (again in my mind).
- heapMonitoring.cpp:590: I do not
completely understand the purpose of
this code: in the end this results in
a fixed value directly dependent
on the Thread address anyway? In the
end this results in a fixed value
directly dependent on the Thread
address anyway?
IOW, what is special about exactly 20
rounds?
So we really want a fast random number
generator that has a specific mean (512k is the default we use). The code uses
the thread
address as the start number of the
sequence (why not, it is random enough is
rationale). Then instead of just starting there, we prime the sequence and
really only start
at the 21st number, it is
arbitrary and I have not done a study to
see if we could do more or less of that.
As I have the statistics of the system up
and running, I'll run some experiments to see if this is needed, is 20 good, or
not.
- also I would consider stripping a
few bits of the threads' address as
initialization value for your rng.
The last three bits (and probably
more, check whether the Thread object
is allocated on special
boundaries) are always zero for them.
Not sure if the given "random" value
is random enough before/after,
this method, so just skip that
comment if you think this is not
required.
I don't know is the honest answer. I think what is
important is that we tend towards a mean and it is random "enough" to not fall
in
pitfalls of only sampling a
subset of objects due to their allocation
order. I added that as test to do to see if it changes the mean in any way for
the 512k
default value and/or if the first
1000 elements look better.
Some more random nits I did not find
a place to put anywhere:
-
ThreadLocalAllocBuffer::_extra_space does not seem to be used
anywhere?
Good catch :).
- Maybe indent the declaration of
ThreadLocalAllocBuffer::_bytes_until_sample to align below the other members of
that group.
Done moved it up a bit to have non static
members together and static separate.
Thanks,
Thomas
Thanks for your review!
Jc