Re: RFR: 8344009: Improve compiler memory statistics

Thomas Stuefe Thu, 13 Feb 2025 22:42:31 -0800

On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:


> Greetings,
> 
> This is a rewrite of the Compiler Memory Statistic. The primary new feature 
> is the capability to track allocations by C2 phases. This will allow for a 
> much faster, more thorough analysis of footprint issues. 
> 
> Tracking Arena memory movement is not trivial since one needs to follow the 
> ebb and flow of allocations over nested C2 phases. A phase typically 
> allocates more than it releases, accruing new nodes and resource area. A 
> phase can also release more than allocated when Arenas carried over from 
> other phases go out of scope in this phase. Finally, it can have high 
> temporary peaks that vanish before the phase ends.
> 
> I wanted to track that information correctly and display it clearly in a way 
> that is easy to understand.
> 
> The patch implements per-phase tracking by instrumenting the `TracePhase` 
> stack object (thanks to @rwestrel for this idea).
> 
> The nice thing with this technique is that it also allows for quick analysis 
> of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in 
> there with a speaking name, and you can see the allocations inside that phase.
> 
> The statistic gives us two new forms of output:
> 
> 1) At the moment the compilation memory *peaked*, we now get a detailed 
> breakdown of that peak usage per phase:
> 
> 
> Arena Usage by Arena Type and compilation phase, at arena usage peak of 
> 58817816:
>     Phase                         Total        ra      node      comp      
> type     index   reglive  regsplit     cienv     other
>     none                        1205512    155104    982984     33712         
> 0         0         0         0         0     33712
>     parse                      11685376    720016   6578728   1899064         
> 0         0         0         0   1832888    654680
>     optimizer                    916584         0    556416         0         
> 0         0         0         0         0    360168
>     escapeAnalysis              1983400         0   1276392    707008         
> 0         0         0         0         0         0
>     connectionGraph              720016         0         0    621832         
> 0         0         0         0     98184         0
>     macroEliminate               196448         0    196448         0         
> 0         0         0         0         0         0
>     iterGVN                      327440         0    196368    131072         
> 0         0         0         0         0         0
>     incrementalInline           3992816         0   3043704    621832         
> 0         0         0         0    261824...

Some additional technical information about how this statistic works:

The JVM informs the statistics about the following events:

A) When a compilation starts

B) When a compilation ends.

C) When a new compilation phase starts. That can happen in nested form.

D) When a compilation phase ends.

E) Whenever an arena grows a new chunk (regardless of whether this was a cached 
chunk from the chunk pool or a newly allocated chunk).

F) When an arena sheds chunks - either by rolling back to a previous 
ResourceMark or because the arena itself gets deleted.

During compilation (between (A) and (B)), we keep the statistic state for this 
compilation in an `ArenaStatCounter` object that is attached to the current 
compiler thread.

When a new compilation phase starts (C), we push the phase info onto a 
`PhaseInfoStack`. When a phase ends, we pop that information.

When we are informed of a new chunk allocation (E), we:
  - Set a stamp in the chunk header to mark it as being owned by this phase and 
this arena type
  - In the `ArenaStatCounter` object, we adjust global counters and counters in 
a two-dimensional table (`ArenaCounterTable`) that keeps counters per arena tag 
and compilation phase.
  - If total memory consumption for this compilation reaches a new peak, we 
take a snapshot of all counters as peak state.
  - We also handle `MemLimit` violations here: if 
`-XX:CompileCommand=memlimit...` was enabled, and the total footprint of the 
compilation surpasses that limit, we either end the JVM with a fatal error or 
we bail on the compilation. That depends on the sub-option given to the command.

When informed of a chunk deletion (F), we:
  - extract the stamp from the chunk header to know what phase/arena type this 
deallocation accounts to
  - we then adjust the counters for that phase/arena type in the 
`ArenaCounterTable`

When a compilation phase ends (D), we adjust the "footprint timeline". The 
footprint timeline - `FootprintTimeline` - is a one-dimensional buffer of 
(phase info, counter) tupels. It represents the "flattened out" form of the 
phase invocation tree: an invocation of a child phase nested in a parent phase 
"interrupts" the parent phase, and when the child phase ends, the parent phase 
is "restarted" as a new entry in the timeline. For example, let's say we 
execute phase "optimizer", and inside that, call the phase "iterGVN" and then 
"incrementalInline". Between these two phases, we allocate from resource area. 
The invocation tree looks like this:


> optimizer  1024 KB
     > iterGVN  1032 KB
< optimizer (cont.) 1032 KB + 1MB resource arena
     > incrementalInline 1032 KB + 1MB resource arena
< optimizer (cont.) 1032 KB + 1MB resource arena


The flattened-out footprint timeline will look somewhat like this:


Phase Sequence Number | Phase Name       | Footprint
5                       optimizer          1024 KB
6                       iterGVN            1032 KB
5                       optimizer          1032 KB + 1MB
7                       incrementalInline  1032 KB + 1MB
5                       optimizer          1032 KB + 1MB


Finally, when the compilation ends, we print out the statistic for it (if the 
suboption `print` was given with `-XX:CompileCommand=memstat`). We also save a 
copy of the counters to a global table that contains the N most expensive 
compilations. That table will be printed when one uses `jcmd <pid> 
Compiler.memory`. We also print it into the hs-err file.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2658400920

Re: RFR: 8344009: Improve compiler memory statistics

Reply via email to