On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:
>> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature >> is the capability to track allocations by C2 phases. This will allow for a >> much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the >> ebb and flow of allocations over nested C2 phases. A phase typically >> allocates more than it releases, accruing new nodes and resource area. A >> phase can also release more than allocated when Arenas carried over from >> other phases go out of scope in this phase. Finally, it can have high >> temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way >> that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` >> stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis >> of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in >> there with a speaking name, and you can see the allocations inside that >> phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed >> breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of >> 58817816: >> Phase Total ra node comp >> type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 >> 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 >> 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 >> 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 >> 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 >> 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 >> 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 >> 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has refreshed the contents of this pull request, and previous > commits have been removed. The incremental views will show differences > compared to the previous content of the PR. The pull request contains one new > commit since the last revision: > > avoid Thread::current in high traffic chunk alloc path I reviewed the C2-specific code and have a couple of comments, otherwise looks good. While reviewing, I found a few more C2 arenas that could be tagged for higher accuracy: - matcher states arena, - superword (auto-vectorizer) arenas, - `Compile::_Compile_types`, and - `OptoRegScheduling` liveness arena. Here is a patch that adds tags for these: https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3. I think these changes are within the scope of this RFE, because the original changeset includes similar ones. If you agree, feel free to merge the patch into this RFE. src/hotspot/share/memory/arena.hpp line 99: > 97: FN(comp, C2 Compile arena) \ > 98: FN(type, C2 Type arena) \ > 99: FN(index, C2 Index arena) \ `tag_index` is not used and can be removed, it seems to be subsumed by `tag_reglive`. src/hotspot/share/opto/chaitin.cpp line 370: > 368: > 369: ResourceArea split_arena(mtCompiler, Arena::Tag::tag_reglive); // > Arena for Split local resources > 370: ResourceArea live_arena(mtCompiler, Arena::Tag::tag_regsplit); // > Arena for liveness & IFG info Suggestion: ResourceArea split_arena(mtCompiler, Arena::Tag::tag_regsplit); // Arena for Split local resources ResourceArea live_arena(mtCompiler, Arena::Tag::tag_reglive); // Arena for liveness & IFG info ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2637314247 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967765880 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967761886