On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:

>> Greetings,
>> 
>> This is a rewrite of the Compiler Memory Statistic. The primary new feature 
>> is the capability to track allocations by C2 phases. This will allow for a 
>> much faster, more thorough analysis of footprint issues. 
>> 
>> Tracking Arena memory movement is not trivial since one needs to follow the 
>> ebb and flow of allocations over nested C2 phases. A phase typically 
>> allocates more than it releases, accruing new nodes and resource area. A 
>> phase can also release more than allocated when Arenas carried over from 
>> other phases go out of scope in this phase. Finally, it can have high 
>> temporary peaks that vanish before the phase ends.
>> 
>> I wanted to track that information correctly and display it clearly in a way 
>> that is easy to understand.
>> 
>> The patch implements per-phase tracking by instrumenting the `TracePhase` 
>> stack object (thanks to @rwestrel for this idea).
>> 
>> The nice thing with this technique is that it also allows for quick analysis 
>> of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in 
>> there with a speaking name, and you can see the allocations inside that 
>> phase.
>> 
>> The statistic gives us two new forms of output:
>> 
>> 1) At the moment the compilation memory *peaked*, we now get a detailed 
>> breakdown of that peak usage per phase:
>> 
>> 
>> Arena Usage by Arena Type and compilation phase, at arena usage peak of 
>> 58817816:
>>     Phase                         Total        ra      node      comp      
>> type     index   reglive  regsplit     cienv     other
>>     none                        1205512    155104    982984     33712        
>>  0         0         0         0         0     33712
>>     parse                      11685376    720016   6578728   1899064        
>>  0         0         0         0   1832888    654680
>>     optimizer                    916584         0    556416         0        
>>  0         0         0         0         0    360168
>>     escapeAnalysis              1983400         0   1276392    707008        
>>  0         0         0         0         0         0
>>     connectionGraph              720016         0         0    621832        
>>  0         0         0         0     98184         0
>>     macroEliminate               196448         0    196448         0        
>>  0         0         0         0         0         0
>>     iterGVN                      327440         0    196368    131072        
>>  0         0         0         0         0         0
>>     incrementalInline           3992816         0   3043704    62...
>
> Thomas Stuefe has refreshed the contents of this pull request, and previous 
> commits have been removed. The incremental views will show differences 
> compared to the previous content of the PR. The pull request contains one new 
> commit since the last revision:
> 
>   avoid Thread::current in high traffic chunk alloc path

src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92:

> 90: 
> 91: // A very simple fixed-width FIFO buffer, used for the phase timeline
> 92: template <typename T, int max>

Would `size` be a better name than `max`?

src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160:

> 158:     void init(T v)        { start = cur = peak = v; }
> 159:     void update(T v)      { cur = v; if (v > peak) peak = v; }
> 160:     dT end_delta() const  { return (dT)cur - (dT)start; }

Should it be `return (dT)(cur - start); }`

src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161:

> 159:     void update(T v)      { cur = v; if (v > peak) peak = v; }
> 160:     dT end_delta() const  { return (dT)cur - (dT)start; }
> 161:     size_t temporary_peak_size() const { return MIN2(peak - cur, peak - 
> start); }

shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`?

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 149:

> 147:       int col = start_indent;
> 148:       check_phase_trace_id(e.info.id);
> 149:       if (omit_empty_phases && e._bytes.end_delta() == 0 && 
> e._bytes.temporary_peak_size() == 0) {

`omit_empty_phases` is always false. Can it be just removed?

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 205:

> 203:     // seed current entry
> 204:     Entry& e = _fifo.current();
> 205:     e._bytes.start = e._bytes.cur = e._bytes.peak = cur_abs;

This can be replaced by `e._bytes.init(cur_abs)`.
Same for the next statement.
On same lines I would suggest to add `Entry::init()` and call it here.

src/hotspot/share/memory/arena.hpp line 48:

> 46:   const size_t _len;      // Size of this Chunk
> 47:   // Used for Compilation Memory Statistic
> 48:   uint64_t _stamp;

This is wasted space if compilation memory stats is not enabled. One way to 
avoid this is to subclass `Chunk` as a `StampedChunk` and use that if 
compilation memory stats is enabled. Is this complexity worth the space saving?

src/hotspot/share/opto/phase.hpp line 125:

> 123:     f(   _t_temporaryTimer1,         "tempTimer1")               \
> 124:     f(   _t_temporaryTimer2,         "tempTimer2")               \
> 125:     f(   _t_testTimer1,              "testTimer1")               \

Would `_t_testPhase1` and `_t_testPhase2` be a better name?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993551
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993720
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993803
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994072
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994121
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968999519
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969041777

Reply via email to