On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe <[email protected]> wrote:
>> Greetings,
>>
>> This is a rewrite of the Compiler Memory Statistic. The primary new feature
>> is the capability to track allocations by C2 phases. This will allow for a
>> much faster, more thorough analysis of footprint issues.
>>
>> Tracking Arena memory movement is not trivial since one needs to follow the
>> ebb and flow of allocations over nested C2 phases. A phase typically
>> allocates more than it releases, accruing new nodes and resource area. A
>> phase can also release more than allocated when Arenas carried over from
>> other phases go out of scope in this phase. Finally, it can have high
>> temporary peaks that vanish before the phase ends.
>>
>> I wanted to track that information correctly and display it clearly in a way
>> that is easy to understand.
>>
>> The patch implements per-phase tracking by instrumenting the `TracePhase`
>> stack object (thanks to @rwestrel for this idea).
>>
>> The nice thing with this technique is that it also allows for quick analysis
>> of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in
>> there with a speaking name, and you can see the allocations inside that
>> phase.
>>
>> The statistic gives us two new forms of output:
>>
>> 1) At the moment the compilation memory *peaked*, we now get a detailed
>> breakdown of that peak usage per phase:
>>
>>
>> Arena Usage by Arena Type and compilation phase, at arena usage peak of
>> 58817816:
>> Phase Total ra node comp
>> type index reglive regsplit cienv other
>> none 1205512 155104 982984 33712
>> 0 0 0 0 0 33712
>> parse 11685376 720016 6578728 1899064
>> 0 0 0 0 1832888 654680
>> optimizer 916584 0 556416 0
>> 0 0 0 0 0 360168
>> escapeAnalysis 1983400 0 1276392 707008
>> 0 0 0 0 0 0
>> connectionGraph 720016 0 0 621832
>> 0 0 0 0 98184 0
>> macroEliminate 196448 0 196448 0
>> 0 0 0 0 0 0
>> iterGVN 327440 0 196368 131072
>> 0 0 0 0 0 0
>> incrementalInline 3992816 0 3043704 62...
>
> Thomas Stuefe has refreshed the contents of this pull request, and previous
> commits have been removed. The incremental views will show differences
> compared to the previous content of the PR. The pull request contains one new
> commit since the last revision:
>
> avoid Thread::current in high traffic chunk alloc path
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92:
> 90:
> 91: // A very simple fixed-width FIFO buffer, used for the phase timeline
> 92: template <typename T, int max>
Would `size` be a better name than `max`?
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160:
> 158: void init(T v) { start = cur = peak = v; }
> 159: void update(T v) { cur = v; if (v > peak) peak = v; }
> 160: dT end_delta() const { return (dT)cur - (dT)start; }
Should it be `return (dT)(cur - start); }`
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161:
> 159: void update(T v) { cur = v; if (v > peak) peak = v; }
> 160: dT end_delta() const { return (dT)cur - (dT)start; }
> 161: size_t temporary_peak_size() const { return MIN2(peak - cur, peak -
> start); }
shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`?
src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 149:
> 147: int col = start_indent;
> 148: check_phase_trace_id(e.info.id);
> 149: if (omit_empty_phases && e._bytes.end_delta() == 0 &&
> e._bytes.temporary_peak_size() == 0) {
`omit_empty_phases` is always false. Can it be just removed?
src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 205:
> 203: // seed current entry
> 204: Entry& e = _fifo.current();
> 205: e._bytes.start = e._bytes.cur = e._bytes.peak = cur_abs;
This can be replaced by `e._bytes.init(cur_abs)`.
Same for the next statement.
On same lines I would suggest to add `Entry::init()` and call it here.
src/hotspot/share/memory/arena.hpp line 48:
> 46: const size_t _len; // Size of this Chunk
> 47: // Used for Compilation Memory Statistic
> 48: uint64_t _stamp;
This is wasted space if compilation memory stats is not enabled. One way to
avoid this is to subclass `Chunk` as a `StampedChunk` and use that if
compilation memory stats is enabled. Is this complexity worth the space saving?
src/hotspot/share/opto/phase.hpp line 125:
> 123: f( _t_temporaryTimer1, "tempTimer1") \
> 124: f( _t_temporaryTimer2, "tempTimer2") \
> 125: f( _t_testTimer1, "testTimer1") \
Would `_t_testPhase1` and `_t_testPhase2` be a better name?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993551
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993720
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993803
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994072
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994121
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968999519
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969041777