RE: RFR(L): 8215624: add parallel heap inspection support for jmap histo(G1)(Internet mail)

Hohensee, Paul Mon, 06 Jul 2020 14:58:06 -0700

I'd like to see this feature added. :)

The CSR looks good, as does the basic parallel inspection algorithm. Stefan's 
done the GC part, so I'll stick to the non-GC part (fwiw, the GC part lgtm).


I'd move all the argument parsing code to JMap.java and just pass the results 
to Hotspot. Both histo() in JMap.java and code in attachListener.* parse the 
command line arguments, though the code in histo() doesn't parse the argument 
to "parallel". I'd upgrade the code in histo() to do a complete parse and pass 
the option values to executeCommandForPid as before: there would just be more 
of them now. That would allow you to eliminate all the parsing code in 
attachListener.cpp as well as the change to arguments.hpp.

heapInspection.hpp:

_shared_miss_count (s/b _missed_count, see below) isn't a size, so it should be 
a uint instead of a size_t. Same with the new parallel_thread_num argument to 
heap_inspection() and populate_table().

Comment copy-edit:
+// Parallel heap inspection task. Parallel inspection can fail due to
+// a native OOM when allocating memory for TL-KlassInfoTable.
+// _success will be set false on an OOM, and serial inspection tried.

_shared_miss_count should be _missed_count to match the missed_count() getter, 
or rename missed_count() to be shared_miss_count(). Whichever way you go, the 
field type should match the getter result type: uint is reasonable.

heapInspection.cpp:

You might use ResourceMark twice in populate_table, separately for the parallel 
attempt and the serial code. If the parallel attempt fails and available memory 
is low, it would be good to clean up the memory used by the parallel attempt 
before doing the serial code.

Style nit in KlassInfoTable::merge_entry(). I'd line up the definitions of k 
and elt, so "k" is even with "elt". And, because it's two lines shorter, I'd 
replace
+  } else {
+    return false;
+  }
with
+  return false;

KlassInfoTableMergeClosure.is_success() should be just success() (i.e., no 
"is_" prefix) because it's a getter.

I'd reorganize the code in populate_table() to make it more clear, vis (I 
changed _shared_missed_count to _missed_count)
+  if (cit.allocation_failed()) {
+    // fail to allocate memory, stop parallel mode
+    Atomic::store(&_success, false);
+    return;
+  }
+  RecordInstanceClosure ric(&cit, _filter);
+  _poi->object_iterate(&ric, worker_id);
+  missed_count = ric.missed_count();
+  {
+    MutexLocker x(&_mutex);
+    merge_success = _shared_cit->merge(&cit);
+  }
+  if (merge_success) {
+    Atomic::add(&_missed_count, missed_count);
+  else {
+    Atomic::store(&_success, false);
+  }

Thanks,
Paul

On 6/29/20, 7:20 PM, "linzang(臧琳)" <[email protected]> wrote:

    Dear All,
            Sorry to bother again, I just want to make sure that is this change 
worth to be continue to work on? If decision is made to not. I think I can drop 
this work and stop asking for help reviewing...
            Thanks for all your help about reviewing this previously.

    BRs,
    Lin

    On 2020/5/9, 3:47 PM, "linzang(臧琳)" <[email protected]> wrote:

        Dear All,
               May I ask your help again for review the latest change?  Thanks!

        BRs,
        Lin

        On 2020/4/28, 1:54 PM, "linzang(臧琳)" <[email protected]> wrote:

            Hi Stefan,
              >>  - Adding Atomic::load/store.
              >>  - Removing the time measurement in the run_task. I renamed 
G1's function
              >>  to run_task_timed. If we need this outside of G1, we can 
rethink the API
              >>  at that point.
               >>  - ZGC style cleanups
               Thanks for revising the patch,  they are all good to me, and I 
have made a tiny change based on it:
                   
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04/
                   
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04-delta/
              it reduce the scope of mutex in ParHeapInspectTask, and delete 
unnecessary comments.

            BRs,
            Lin

            On 2020/4/27, 4:34 PM, "Stefan Karlsson" 
<[email protected]> wrote:

                Hi Lin,

                On 2020-04-26 05:10, linzang(臧琳) wrote:
                > Hi Stefan and Paul，
                >      I have made a new patch based on your comments and 
Stefan's Poc code:
                >      Webrev: 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03/
                >      Delta(based on Stefan's change:) : 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03-delta/webrev_03-delta/

                Thanks for providing a delta patch. It makes it much easier to 
look at,
                and more likely for reviewers to continue reviewing.

                I'm going to continue focusing on the GC parts, and leave the 
rest to
                others to review.

                >
                >      And Here are main changed I made and want to discuss 
with you:
                >      1.  changed"parallelThreadNum=" to "parallel=" for jmap 
-histo options.
                >      2.  Add logic to test where parallelHeapInspection is 
fail, in heapInspection.cpp
                >            This is because the parHeapInspectTask create 
thread local KlassInfoTable in it's work() method, and this may fail because of 
native OOM, in this case, the parallel should fail and serial heap inspection 
can be tried.
                >            One more thing I want discuss with you is about 
the member "_success" of parHeapInspectTask, when native OOM happenes, it is 
set to false. And since this "set" operation can be conducted in multiple 
threads, should it be atomic ops?  IMO, this is not necessary because 
"_success" can only be set to false, and there is no way to change it from back 
to true after the ParHeapInspectTask instance is created, so it is save to be 
non-atomic, do you agree with that?

                In these situations you should be using the Atomic::load/store
                primitives. We're moving toward a later C++ standard were data 
races are
                considered undefined behavior.

                >     3. make CollectedHeap::run_task() be an abstract virtual 
func, so that every subclass of collectedHeap should support it, so later 
implementation of new collectedHeap will not miss the "parallel" features.
                >           The problem I want to discuss with you is about 
epsilonHeap and SerialHeap, as they may not need parallel heap iteration, so I 
only make task->work(0), in case the run_task() is invoked someway in future. 
Another way is to left run_task()  unimplemented, which one do you think is 
better?

                I don't have a strong opinion about this.

                  And also please help take a look at the zHeap, as there is a 
class
                zTask that wrap the abstractGangTask, and the 
collectedHeap::run_task()
                only accept  AbstraceGangTask* as argument, so I made a 
delegate class
                to adapt it , please see src/hotspot/share/gc/z/zHeap.cpp.
                >
                >        There maybe other better ways to sovle the above 
problems, welcome for any comments, Thanks!

                I've created a few cleanups and changes on top of your latest 
patch:

                https://cr.openjdk.java.net/~stefank/8215624/webrev.02.delta
                https://cr.openjdk.java.net/~stefank/8215624/webrev.02

                - Adding Atomic::load/store.
                - Removing the time measurement in the run_task. I renamed G1's 
function
                to run_task_timed. If we need this outside of G1, we can 
rethink the API
                at that point.
                - ZGC style cleanups

                Thanks,
                StefanK

                >
                > BRs,
                > Lin
                >
                > On 2020/4/23, 11:08 AM, "linzang(臧琳)" <[email protected]> 
wrote:
                >
                >      Thanks Paul! I agree with using "parallel", will make 
the update in next patch, Thanks for help update the CSR.
                >
                >      BRs,
                >      Lin
                >
                >      On 2020/4/23, 4:42 AM, "Hohensee, Paul" 
<[email protected]> wrote:
                >
                >          For the interface, I'd use "parallel" instead of 
"parallelThreadNum". All the other options are lower case, and it's a lot 
easier to type "parallel". I took the liberty of updating the CSR. If you're ok 
with it, you might want to change variable names and such, plus of course 
JMap.usage.
                >
                >          Thanks,
                >          Paul
                >
                >          On 4/22/20, 2:29 AM, "serviceability-dev on behalf 
of linzang(臧琳)" <[email protected] on behalf of 
[email protected]> wrote:
                >
                >              Dear Stefan,
                >
                >                      Thanks a lot! I agree with you to 
decouple the heap inspection code with GC's.
                >                      I will start  from your POC code, may 
discuss with you later.
                >
                >
                >              BRs,
                >              Lin
                >
                >              On 2020/4/22, 5:14 PM, "Stefan Karlsson" 
<[email protected]> wrote:
                >
                >                  Hi Lin,
                >
                >                  I took a look at this earlier and saw that 
the heap inspection code is
                >                  strongly coupled with the CollectedHeap and 
G1CollectedHeap. I'd prefer
                >                  if we'd abstract this away, so that the GCs 
only provide a "parallel
                >                  object iteration" interface, and the heap 
inspection code is kept elsewhere.
                >
                >                  I started experimenting with doing that, but 
other higher-priority (to
                >                  me) tasks have had to take precedence.
                >
                >                  I've uploaded my work-in-progress / 
proof-of-concept:
                >                    
https://cr.openjdk.java.net/~stefank/8215624/webrev.01.delta/
                >                    
https://cr.openjdk.java.net/~stefank/8215624/webrev.01/
                >
                >                  The current code doesn't handle the 
lifecycle (deletion) of the
                >                  ParallelObjectIterators. There's also code 
left unimplemented in around
                >                  CollectedHeap::run_task. However, I think 
this could work as a basis to
                >                  pull out the heap inspection code out of the 
GCs.
                >
                >                  Thanks,
                >                  StefanK
                >
                >                  On 2020-04-22 02:21, linzang(臧琳) wrote:
                >                  > Dear all,
                >                  >       May I ask you help to review? This 
RFR has been there for quite a while.
                >                  >       Thanks!
                >                  >
                >                  > BRs,
                >                  > Lin
                >                  >
                >                  > > On 2020/3/16, 5:18 PM, "linzang(臧琳)" 
<[email protected]> wrote:>
                >                  >
                >                  >>    Just update a new path, my preliminary 
measure show about 3.5x speedup of jmap histo on a nearly full 4GB G1 heap 
(8-core platform with parallel thread number set to 4).
                >                  >>     webrev: 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/
                >                  >>     bug: 
https://bugs.openjdk.java.net/browse/JDK-8215624
                >                  >>     CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290
                >                  >>     BRs,
                >                  >>       Lin
                >                  >>       > On 2020/3/2, 9:56 PM, 
"linzang(臧琳)" <[email protected]> wrote:
                >                  >>       >
                >                  >>       >    Dear all,
                >                  >>       >          Let me try to ease the 
reviewing work by some explanation :P
                >                  >>       >          The patch's target is to 
speed up jmap -histo for heap iteration, from my experience it is necessary for 
large heap investigation. E.g in bigData scenario I have tried to conduct jmap 
-histo against 180GB heap, it does take quite a while.
                >                  >>       >          And if my understanding 
is corrent, even the jmap -histo without "live" option does heap inspection 
with heap lock acquired. so it is very likely to block mutator thread in 
allocation-sensitive scenario. I would say the faster the heap inspection does, 
the shorter the mutator be blocked. This is parallel iteration for jmap is 
necessary.
                >                  >>       >          I think the parallel 
heap inspection should be applied to all kind of heap. However, consider the 
heap layout are different for  GCs, much time is required to understand all 
kinds of the heap layout to make the whole change. IMO, It is not wise to have 
a huge patch for the whole solution at once, and it is even harder to review 
it. So I plan to implement it incrementally, the first patch (this one) is 
going to confirm the implemention detail of how jmap accept the new option, 
passes it to attachListener of the jvm process and then how to make the 
parallel inspection closure be generic enough to make it easy to extend to 
different heap layout. And also how to implement the heap inspection in 
specific gc's heap. This patch use G1's heap as the begining.
                >                  >>       >          This patch actually do 
several things:
                >                  >>       >          1. Add an option 
"parallelThreadNum=<N>" to jmap -histo, the default behavior is to set N to 0, 
means let's JVM decide how many threads to use for heap inspection. Set this 
option to 1 will disable parallel heap inspection. (more details in CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290)
                >                  >>       >          2. Make a change in how 
Jmap passing arguments, changes in 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html,
 originally it pass options as separate arguments to attachListener, this patch 
change to that all options be compose to a single string. So the arg_count_max 
in attachListener.hpp do not need to be changed, and hence avoid the 
compatibility issue, as disscussed at 
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html
                >                  >>       >         3. Add an abstract class 
ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint 
worker_id) method prepares the data structure (KlassInfoTable) need for every 
parallel worker thread, and then call do_object_iterate_parallel() which is 
heap specific implementation. I also added some machenism in KlassInfoTable to 
support parallel iteration, such as merge().
                >                  >>       >        4. In specific heap (G1 in 
this patch), create a subclass of ParHeapInspectTask, implement the 
do_object_iterate_parallel() for parallel heap inspection. For G1, it simply 
invoke g1CollectedHeap's object_iterate_parallel().
                >                  >>       >        5. Add related test.
                >                  >>       >        6. it may be easy to 
extend this patch for other kinds of heap by creating subclass of 
ParHeapInspectTask and implement the do_object_iterate_parallel().
                >                  >>       >
                >                  >>       >    Hope these info could help on 
code review and initate the discussion :-)
                >                  >>       >    Thanks!
                >                  >>       >
                >                  >>       >    BRs,
                >                  >>       >    Lin
                >                  >>       >    >On 2020/2/19, 9:40 AM, 
"linzang(臧琳)" <[email protected]> wrote:.
                >                  >>       >    >
                >                  >>       >    >  Re-post this RFR with 
correct enhancement number to make it trackable.
                >                  >>       >    >  please ignore the previous 
wrong post. sorry for troubles.
                >                  >>       >    >
                >                  >>       >    >   webrev: 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/
                >                  >>       >    >    Hi bug: 
https://bugs.openjdk.java.net/browse/JDK-8215624
                >                  >>       >    >    CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290
                >                  >>       >    >    --------------
                >                  >>       >    >    Lin
                >                  >>       >    >    >Hi Lin,
                >                  >   >     >    >    >
                >                  >>       >    >    >Could you, please, 
re-post your RFR with the right enhancement number in
                >                  >>       >    >    >the message subject?
                >                  >>       >    >    >It will be more 
trackable this way.
                >                  >>       >    >    >
                >                  >>       >    >    >Thanks,
                >                  >>       >    >    >Serguei
                >                  >>       >    >    >
                >                  >>       >    >    >
                >                  >>       >    >    >On 2/17/20 10:29 PM, 
linzang(臧琳) wrote:
                >                  >>       >    >    >> Dear David,
                >                  >>       >    >    >>        Thanks a lot!
                >                  >>       >    >    >>       I have updated 
the refined code to 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
                >                  >>       >    >    >>        IMHO the 
parallel heap inspection can be extended to all kinds of heap as long as the 
heap layout can support parallel iteration.
                >                  >>       >    >    >>        Maybe we can 
firstly use this webrev to discuss how to implement it, because I am not sure 
my current implementation is an appropriate way to communicate with 
collectedHeap, then we can extend the solution to other kinds of heap.
                >                  >>       >    >    >>
                >                  >>       >    >    >> Thanks,
                >                  >>       >    >    >> --------------
                >                  >>       >    >    >> Lin
                >                  >>       >    >    >>> Hi Lin,
                >                  >>       >    >    >>>
                >                  >>       >    >    >>> Adding in 
hotspot-gc-dev as they need to see how this interacts with GC
                >                  >>       >    >    >>> worker threads, and 
whether it needs to be extended beyond G1.
                >                  >>       >    >    >>>
                >                  >>       >    >   >>> I happened to spot one 
nit when browsing:
                >                  >>       >    >    >>>
                >                  >>       >    >    >>> 
src/hotspot/share/gc/shared/collectedHeap.hpp
                >                  >>       >    >    >>>
                >                  >>       >    >    >>> +   virtual bool 
run_par_heap_inspect_task(KlassInfoTable* cit,
                >                  >>       >    >    >>> +                     
                     BoolObjectClosure* filter,
                >                  >>       >    >    >>> +                     
                     size_t* missed_count,
                >                  >>       >    >    >>> +                     
                     size_t thread_num) {
                >                  >>       >    >    >>> +     return NULL;
                >                  >>       >    >    >>>
                >                  >>       >    >    >>> s/NULL/false/
                >                  >>       >    >    >>>
                >                  >>       >    >    >>> Cheers,
                >                  >>       >    >    >>> David
                >                  >   >     >    >    >>>
                >                  >>       >    >    >>> On 18/02/2020 2:15 
pm, linzang(臧琳) wrote:
                >                  >>       >    >    >>>> Dear All,
                >                  >>       >    >    >>>>         May I ask 
your help to review the follow changes:
                >                  >>       >    >    >>>>         webrev:
                >                  >>       >    >    >>>> 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
                >                  >>       >    >    >>>>      bug: 
https://bugs.openjdk.java.net/browse/JDK-8215624
                >                  >>       >    >    >>>>      related CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290
                >                  >>       >    >    >>>>         This patch 
enable parallel heap inspection of G1 for jmap histo.
                >                  >>       >    >    >>>>         my simple 
test shown it can speed up 2x of jmap -histo with
                >                  >>       >    >    >>>> parallelThreadNum 
set to 2 for heap at ~500M on 4-core platform.
                >                  >>       >    >    >>>>
                >                  >>       >    >    >>>> 
------------------------------------------------------------------------
                >                  >>       >    >    >>>> BRs,
                >                  >>       >    >    >>>> Lin
                >                  >>       >    >    >> >
                >                  >>       >    >    >
                >                  >
                >                  >
                >                  >
                >
                >
                >
                >
                >
                >

RE: RFR(L): 8215624: add parallel heap inspection support for jmap histo(G1)(Internet mail)

Reply via email to