Hi Lin,

I took a look at this earlier and saw that the heap inspection code is strongly coupled with the CollectedHeap and G1CollectedHeap. I'd prefer if we'd abstract this away, so that the GCs only provide a "parallel object iteration" interface, and the heap inspection code is kept elsewhere.

I started experimenting with doing that, but other higher-priority (to me) tasks have had to take precedence.

I've uploaded my work-in-progress / proof-of-concept:
 https://cr.openjdk.java.net/~stefank/8215624/webrev.01.delta/
 https://cr.openjdk.java.net/~stefank/8215624/webrev.01/

The current code doesn't handle the lifecycle (deletion) of the ParallelObjectIterators. There's also code left unimplemented in around CollectedHeap::run_task. However, I think this could work as a basis to pull out the heap inspection code out of the GCs.

Thanks,
StefanK

On 2020-04-22 02:21, linzang(臧琳) wrote:
Dear all,
      May I ask you help to review? This RFR has been there for quite a while.
      Thanks!
BRs,
Lin

> On 2020/3/16, 5:18 PM, "linzang(臧琳)" <linz...@tencent.com> wrote:>

   Just update a new path, my preliminary measure show about 3.5x speedup of 
jmap histo on a nearly full 4GB G1 heap (8-core platform with parallel thread 
number set to 4).
    webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/
    bug: https://bugs.openjdk.java.net/browse/JDK-8215624
    CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
    BRs,
      Lin
      > On 2020/3/2, 9:56 PM, "linzang(臧琳)" <linz...@tencent.com> wrote:
      >
      >    Dear all,
      >          Let me try to ease the reviewing work by some explanation :P
      >          The patch's target is to speed up jmap -histo for heap 
iteration, from my experience it is necessary for large heap investigation. E.g in 
bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does 
take quite a while.
      >          And if my understanding is corrent, even the jmap -histo without 
"live" option does heap inspection with heap lock acquired. so it is very likely 
to block mutator thread in allocation-sensitive scenario. I would say the faster the heap 
inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is 
necessary.
      >          I think the parallel heap inspection should be applied to all 
kind of heap. However, consider the heap layout are different for  GCs, much time 
is required to understand all kinds of the heap layout to make the whole change. 
IMO, It is not wise to have a huge patch for the whole solution at once, and it is 
even harder to review it. So I plan to implement it incrementally, the first patch 
(this one) is going to confirm the implemention detail of how jmap accept the new 
option, passes it to attachListener of the jvm process and then how to make the 
parallel inspection closure be generic enough to make it easy to extend to 
different heap layout. And also how to implement the heap inspection in specific 
gc's heap. This patch use G1's heap as the begining.
      >          This patch actually do several things:
      >          1. Add an option "parallelThreadNum=<N>" to jmap -histo, the 
default behavior is to set N to 0, means let's JVM decide how many threads to use for heap 
inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290)
      >          2. Make a change in how Jmap passing arguments, changes in 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html,
 originally it pass options as separate arguments to attachListener, this patch 
change to that all options be compose to a single string. So the arg_count_max in 
attachListener.hpp do not need to be changed, and hence avoid the compatibility 
issue, as disscussed at 
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html
      >         3. Add an abstract class ParHeapInspectTask in 
heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares 
the data structure (KlassInfoTable) need for every parallel worker thread, and 
then call do_object_iterate_parallel() which is heap specific implementation. I 
also added some machenism in KlassInfoTable to support parallel iteration, such as 
merge().
      >        4. In specific heap (G1 in this patch), create a subclass of 
ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap 
inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel().
      >        5. Add related test.
      >        6. it may be easy to extend this patch for other kinds of heap 
by creating subclass of ParHeapInspectTask and implement the 
do_object_iterate_parallel().
      >
      >    Hope these info could help on code review and initate the discussion 
:-)
      >    Thanks!
      >
      >    BRs,
      >    Lin
      >    >On 2020/2/19, 9:40 AM, "linzang(臧琳)" <linz...@tencent.com> wrote:.
      >    >
      >    >  Re-post this RFR with correct enhancement number to make it 
trackable.
      >    >  please ignore the previous wrong post. sorry for troubles.
      >    >
      >    >   webrev: 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/
      >    >    Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624
      >    >    CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
      >    >    --------------
      >    >    Lin
      >    >    >Hi Lin,
  >     >    >    >
      >    >    >Could you, please, re-post your RFR with the right enhancement 
number in
      >    >    >the message subject?
      >    >    >It will be more trackable this way.
      >    >    >
      >    >    >Thanks,
      >    >    >Serguei
      >    >    >
      >    >    >
      >    >    >On 2/17/20 10:29 PM, linzang(臧琳) wrote:
      >    >    >> Dear David,
      >    >    >>        Thanks a lot!
      >    >    >>       I have updated the refined code to 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
      >    >    >>        IMHO the parallel heap inspection can be extended to 
all kinds of heap as long as the heap layout can support parallel iteration.
      >    >    >>        Maybe we can firstly use this webrev to discuss how 
to implement it, because I am not sure my current implementation is an appropriate way to 
communicate with collectedHeap, then we can extend the solution to other kinds of heap.
      >    >    >>
      >    >    >> Thanks,
      >    >    >> --------------
      >    >    >> Lin
      >    >    >>> Hi Lin,
      >    >    >>>
      >    >    >>> Adding in hotspot-gc-dev as they need to see how this 
interacts with GC
      >    >    >>> worker threads, and whether it needs to be extended beyond 
G1.
      >    >    >>>
      >    >   >>> I happened to spot one nit when browsing:
      >    >    >>>
      >    >    >>> src/hotspot/share/gc/shared/collectedHeap.hpp
      >    >    >>>
      >    >    >>> +   virtual bool run_par_heap_inspect_task(KlassInfoTable* 
cit,
      >    >    >>> +                                          
BoolObjectClosure* filter,
      >    >    >>> +                                          size_t* 
missed_count,
      >    >    >>> +                                          size_t 
thread_num) {
      >    >    >>> +     return NULL;
      >    >    >>>
      >    >    >>> s/NULL/false/
      >    >    >>>
      >    >    >>> Cheers,
      >    >    >>> David
  >     >    >    >>>
      >    >    >>> On 18/02/2020 2:15 pm, linzang(臧琳) wrote:
      >    >    >>>> Dear All,
      >    >    >>>>         May I ask your help to review the follow changes:
      >    >    >>>>         webrev:
      >    >    >>>> 
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
      >    >    >>>>      bug: https://bugs.openjdk.java.net/browse/JDK-8215624
      >    >    >>>>      related CSR: 
https://bugs.openjdk.java.net/browse/JDK-8239290
      >    >    >>>>         This patch enable parallel heap inspection of G1 
for jmap histo.
      >    >    >>>>         my simple test shown it can speed up 2x of jmap 
-histo with
      >    >    >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core 
platform.
      >    >    >>>>
      >    >    >>>> 
------------------------------------------------------------------------
      >    >    >>>> BRs,
      >    >    >>>> Lin
      >    >    >> >
      >    >    >




Reply via email to