On Mon, 10 Jul 2023 13:53:36 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:
>> This is a continuation of https://github.com/openjdk/jdk/pull/10085. I >> closed https://github.com/openjdk/jdk/pull/10085 because it had accumulated >> too much comment history and got confusing. For a history of this issue, see >> previous discussions [1] and the comment section of 10085. >> >> --------------- >> >> This RFE adds the option to trim the Glibc heap periodically. This can >> recover a significant memory footprint if the VM process suffers from >> high-but-rare malloc spikes. It does not matter who causes the spikes: the >> JDK or customer code running in the JVM process. >> >> ### Background: >> >> The Glibc is reluctant to return memory to the OS. Temporary malloc spikes >> often carry over as permanent RSS increase. Note that C-heap retention is >> difficult to observe. Since it is freed memory, it won't appear in NMT; it >> is just a part of RSS. >> >> This is, effectively, caching - a performance tradeoff by the glibc. It >> makes a lot of sense with applications that cause high traffic on the >> C-heap. The JVM, however, clusters allocations and often rolls its own >> memory management based on virtual memory for many of its use cases. >> >> To manually trim the C-heap, Glibc exposes `malloc_trim(3)`. With JDK 18 >> [2], we added a new jcmd command to *manually* trim the C-heap on Linux >> (`jcmd System.trim_native_heap`). We then observed customers running this >> command periodically to slim down process sizes of container-bound jvms. >> That is cumbersome, and the JVM can do this a lot better - among other >> things because it knows best when *not* to trim. >> >> #### GLIBC internals >> >> The following information I took from the glibc source code and >> experimenting. >> >> ##### Why do we need to trim manually? Does the Glibc not trim on free? >> >> Upon `free()`, glibc may return memory to the OS if: >> - the returned block was mmap'ed >> - the returned block was not added to tcache or to fastbins >> - the returned block, possibly merged with its two immediate neighbors, had >> they been free, is larger than FASTBIN_CONSOLIDATION_THRESHOLD (64K) - in >> that case: >> a) for the main arena, glibc attempts to lower the brk() >> b) for mmap-ed heaps, glibc attempts to completely unmap or shrink the >> heap. >> In both cases, (a) and (b), only the top portion of the heap is reclaimed. >> "Holes" in the middle of other in-use chunks are not reclaimed. >> >> So: glibc *may* automatically reclaim memory. In normal configurations, with >> typical C-heap allocation granularity, it is unlikely. >> >> To increase the ... > > Thomas Stuefe has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 32 additional > commits since the last revision: > > - Make test spikes more pronounced > - Dont query procfs if logging is off > - rename logtag again > - When probing for safepoint end, use the smaller of (interval, 250ms) > - Remove TrimNativeHeap and expand TrimNativeHeapInterval > - Improve comments for non-supportive platforms > - Aleksey cosmetics > - suspend count return 16 bits > - Fix linker errors > - Merge branch 'master' into JDK-8293114-JVM-should-trim-the-native-heap > - ... and 22 more: https://git.openjdk.org/jdk/compare/417c1c87...15566761 Hopefully a final read. I think there are minor things left, see comments. Alternatively, apply this patch over your current PR, which contains fixes for my comments, and then some polishing: [trimnative-shipilev-1.patch](https://github.com/openjdk/jdk/files/12025768/trimnative-shipilev-1.patch) ...also, Windows builds are failing. src/hotspot/share/runtime/trimNativeHeap.cpp line 47: > 45: > 46: // Statistics > 47: unsigned _num_trims_performed; Sorry for the nit, but this is `uint16_t` too then, for consistency? src/hotspot/share/runtime/trimNativeHeap.cpp line 75: > 73: SafepointSynchronize::is_synchronizing(); > 74: } > 75: static constexpr int64_t safepoint_poll_ms = 250; Let's document this a little: // Upper limit for the backoff during pending/in-progress safepoint. // Chosen as reasonable value to balance the overheads of waking up // during the safepoint, which might have undesired effects on latencies, // and the accuracy in tracking the trimming interval. static constexpr int64_t safepoint_poll_ms = 250; src/hotspot/share/runtime/trimNativeHeap.cpp line 90: > 88: assert(NativeHeapTrimmer::enabled(), "Only call if enabled"); > 89: > 90: LogStartStopMark logStartStop; Hotspot style: no camel case for local identifiers. src/hotspot/share/runtime/trimNativeHeap.cpp line 110: > 108: } else if (at_or_nearing_safepoint()) { > 109: const int64_t wait_ms = > MIN2((int64_t)TrimNativeHeapInterval, safepoint_poll_ms); > 110: ml.wait(safepoint_poll_ms); `MIN2<int64_t>(..., ...)` might work better without a cast? Also, let's actually use `wait_ms` here :) src/hotspot/share/runtime/trimNativeHeap.cpp line 117: > 115: tnow = now(); > 116: > 117: } while (at_or_nearing_safepoint() || is_suspended() || > next_trim_time > tnow); Maybe invert this to pre-condition while? src/hotspot/share/runtime/trimNativeHeap.cpp line 120: > 118: } // Lock scope > 119: > 120: // 2 - Trim outside of lock protection. There is no `1 -` to match this `2 -` to. src/hotspot/share/runtime/trimNativeHeap.hpp line 46: > 44: static void cleanup(); > 45: > 46: static uint64_t num_trims_performed(); Need this for anything? Does not seem to be implemented. test/hotspot/jtreg/runtime/os/TestTrimNative.java line 34: > 32: * @build jdk.test.whitebox.WhiteBox > 33: * @run driver jdk.test.lib.helpers.ClassFileInstaller > jdk.test.whitebox.WhiteBox > 34: * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions > -XX:+WhiteBoxAPI TestTrimNative trimNative I see that we always spawn a VM with Whitebox enabled explicitly there. Do you need to enable Whitebox for these? Also, can these be just `@run driver`? ------------- PR Review: https://git.openjdk.org/jdk/pull/14781#pullrequestreview-1522725836 PR Comment: https://git.openjdk.org/jdk/pull/14781#issuecomment-1632093967 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1258705275 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1258833848 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1258835428 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1260797898 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1260809509 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1260799630 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1258840799 PR Review Comment: https://git.openjdk.org/jdk/pull/14781#discussion_r1260764210