On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe <[email protected]> wrote:
> Greetings,
>
> This is a rewrite of the Compiler Memory Statistic. The primary new feature
> is the capability to track allocations by C2 phases. This will allow for a
> much faster, more thorough analysis of footprint issues.
>
> Tracking Arena memory movement is not trivial since one needs to follow the
> ebb and flow of allocations over nested C2 phases. A phase typically
> allocates more than it releases, accruing new nodes and resource area. A
> phase can also release more than allocated when Arenas carried over from
> other phases go out of scope in this phase. Finally, it can have high
> temporary peaks that vanish before the phase ends.
>
> I wanted to track that information correctly and display it clearly in a way
> that is easy to understand.
>
> The patch implements per-phase tracking by instrumenting the `TracePhase`
> stack object (thanks to @rwestrel for this idea).
>
> The nice thing with this technique is that it also allows for quick analysis
> of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in
> there with a speaking name, and you can see the allocations inside that phase.
>
> The statistic gives us two new forms of output:
>
> 1) At the moment the compilation memory *peaked*, we now get a detailed
> breakdown of that peak usage per phase:
>
>
> Arena Usage by Arena Type and compilation phase, at arena usage peak of
> 58817816:
> Phase Total ra node comp
> type index reglive regsplit cienv other
> none 1205512 155104 982984 33712
> 0 0 0 0 0 33712
> parse 11685376 720016 6578728 1899064
> 0 0 0 0 1832888 654680
> optimizer 916584 0 556416 0
> 0 0 0 0 0 360168
> escapeAnalysis 1983400 0 1276392 707008
> 0 0 0 0 0 0
> connectionGraph 720016 0 0 621832
> 0 0 0 0 98184 0
> macroEliminate 196448 0 196448 0
> 0 0 0 0 0 0
> iterGVN 327440 0 196368 131072
> 0 0 0 0 0 0
> incrementalInline 3992816 0 3043704 621832
> 0 0 0 0 261824...
Some additional technical information about how this statistic works:
The JVM informs the statistics about the following events:
A) When a compilation starts
B) When a compilation ends.
C) When a new compilation phase starts. That can happen in nested form.
D) When a compilation phase ends.
E) Whenever an arena grows a new chunk (regardless of whether this was a cached
chunk from the chunk pool or a newly allocated chunk).
F) When an arena sheds chunks - either by rolling back to a previous
ResourceMark or because the arena itself gets deleted.
During compilation (between (A) and (B)), we keep the statistic state for this
compilation in an `ArenaStatCounter` object that is attached to the current
compiler thread.
When a new compilation phase starts (C), we push the phase info onto a
`PhaseInfoStack`. When a phase ends, we pop that information.
When we are informed of a new chunk allocation (E), we:
- Set a stamp in the chunk header to mark it as being owned by this phase and
this arena type
- In the `ArenaStatCounter` object, we adjust global counters and counters in
a two-dimensional table (`ArenaCounterTable`) that keeps counters per arena tag
and compilation phase.
- If total memory consumption for this compilation reaches a new peak, we
take a snapshot of all counters as peak state.
- We also handle `MemLimit` violations here: if
`-XX:CompileCommand=memlimit...` was enabled, and the total footprint of the
compilation surpasses that limit, we either end the JVM with a fatal error or
we bail on the compilation. That depends on the sub-option given to the command.
When informed of a chunk deletion (F), we:
- extract the stamp from the chunk header to know what phase/arena type this
deallocation accounts to
- we then adjust the counters for that phase/arena type in the
`ArenaCounterTable`
When a compilation phase ends (D), we adjust the "footprint timeline". The
footprint timeline - `FootprintTimeline` - is a one-dimensional buffer of
(phase info, counter) tupels. It represents the "flattened out" form of the
phase invocation tree: an invocation of a child phase nested in a parent phase
"interrupts" the parent phase, and when the child phase ends, the parent phase
is "restarted" as a new entry in the timeline. For example, let's say we
execute phase "optimizer", and inside that, call the phase "iterGVN" and then
"incrementalInline". Between these two phases, we allocate from resource area.
The invocation tree looks like this:
> optimizer 1024 KB
> iterGVN 1032 KB
< optimizer (cont.) 1032 KB + 1MB resource arena
> incrementalInline 1032 KB + 1MB resource arena
< optimizer (cont.) 1032 KB + 1MB resource arena
The flattened-out footprint timeline will look somewhat like this:
Phase Sequence Number | Phase Name | Footprint
5 optimizer 1024 KB
6 iterGVN 1032 KB
5 optimizer 1032 KB + 1MB
7 incrementalInline 1032 KB + 1MB
5 optimizer 1032 KB + 1MB
Finally, when the compilation ends, we print out the statistic for it (if the
suboption `print` was given with `-XX:CompileCommand=memstat`). We also save a
copy of the counters to a global table that contains the N most expensive
compilations. That table will be printed when one uses `jcmd <pid>
Compiler.memory`. We also print it into the hs-err file.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2658400920