On Thu, 17 Jun 2021 08:53:55 GMT, Thomas Stuefe <[email protected]> wrote:
>> Proposal to add a Linux+glibc-only jcmd to manually induce malloc_trim(3) in
>> the VM process.
>>
>>
>> The glibc is somewhat notorious for retaining released C Heap memory:
>> calling free(3) returns memory to the glibc, and most libc variants will
>> return at least a portion of it back to the Operating System, but the glibc
>> often does not.
>>
>> This depends on the granularity of the allocations and a number of other
>> factors, but we found that many small allocations in particular may cause
>> the process heap segment (hence RSS) to get bloaty. This can cause the VM to
>> not recover from C-heap usage spikes.
>>
>> The glibc offers an API, "malloc_trim", which can be used to cause the glibc
>> to return free'd memory back to the Operating System.
>>
>> This may cost performance, however, and therefore I hesitate to call
>> malloc_trim automatically. That may be an idea for another day.
>>
>> Instead of an automatic trim I propose to add a jcmd which allows to
>> manually trigger a libc heap trim. Such a command would have two purposes:
>> - when analyzing cases of high memory footprint, it allows to distinguish
>> "real" footprint, e.g. leaks, from a cases where the glibc just holds on to
>> memory
>> - as a stop gap measure it allows to release pressure from a high footprint
>> scenario.
>>
>> Note that this command also helps with analyzing libc peaks which had
>> nothing to do with the VM - e.g. peaks created by customer code which just
>> happens to share the same process as the VM. Such memory does not even have
>> to show up in NMT.
>>
>> I propose to introduce this command for Linux only. Other OSes (apart maybe
>> AIX) do not seem to have this problem, but Linux is arguably important
>> enough in itself to justify a Linux specific jcmd.
>>
>> If this finds agreement, I will file a CSR.
>>
>> Note that an alternative to a Linux-only jcmd would be a command which would
>> trim the C-heap on all platforms, with implementations to be filled out
>> later.
>>
>> =========
>>
>> This patch:
>>
>> - introduces a new jcmd, "VM.trim_libc_heap", no arguments, which trims the
>> glibc heap on glibc platforms.
>> - includes a (rather basic) test
>> - the command calls malloc_trim(3), and additionally prints out its effect
>> (changes caused in virt size, rss and swap space)
>> - I refactored some code in os_linux.cpp to factor out scanning
>> /proc/self/status to get kernel memory information.
>>
>> =========
>>
>> Example:
>>
>> A programm causes a temporary peak in C-heap usage (in this case, triggered
>> via Unsafe.allocateMemory), right away frees the memory again, so its not
>> leaky. The peak in RSS was ~8G (even though the user allocation was way
>> smaller - glibc has a lot of overhead). The effects of this peak linger even
>> after returning that memory to the glibc:
>>
>>
>>
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
>> Resident Set Size: 8685896K (peak: 8685896K) (anon: 8648680K, file: 37216K,
>> shmem: 0K)
>> ^^^^^^^^
>>
>>
>> We execute the new trim command via jcmd:
>>
>>
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.trim_libc_heap
>> 18770:
>> Attempting trim...
>> Done.
>> Virtual size before: 28849744k, after: 28849724k, (-20k)
>> RSS before: 8685896k, after: 920740k, (-7765156k) <<<<
>> Swap before: 0k, after: 0k, (0k)
>>
>>
>> It prints out reduction in virtual size, rss and swap. The virtual size did
>> not decrease since no mappings had been unmapped by the glibc. However, the
>> process heap was shrunk heavily by the glibc, resulting in a large drop in
>> RSS (8.5G->900M), freeing >7G of memory:
>>
>>
>> thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
>> Resident Set Size: 920740K (peak: 8686004K) (anon: 883460K, file: 37280K,
>> shmem: 0K)
>> ^^^^^^^
>>
>>
>> When the VM is started with -Xlog:os, this is also logged:
>>
>>
>> [139,068s][info][os] malloc_trim:
>> [139,068s][info][os] Virtual size before: 28849744k, after: 28849724k, (-20k)
>> RSS before: 8685896k, after: 920740k, (-7765156k)
>> Swap before: 0k, after: 0k, (0k)
>
> Thomas Stuefe has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Feedback Severin; renamed query function
I like this little new diagnostic command which I think can be quite useful in
specific situations.
However, in contrast to other reviewers, I'd rather keep this simple and Glibc
specific instead of extending it to a more general but mostly empty command.
I'd therefore propose to rename this command to `glibc_trim_heap` to make it
evident from the command name already that it is Glibc-specific.
Besides that, just cosmetic changes and suggestions.
src/hotspot/os/linux/os_linux.cpp line 2142:
> 2140: bool os::Linux::query_process_memory_info(os::Linux::meminfo_t* info) {
> 2141: FILE* f = ::fopen("/proc/self/status", "r");
> 2142: const int num_values = 8;
Refactoring of `os::Linux::print_process_memory_info` looks good. But now that
you've already created `os::Linux::meminfo_t` anyway I'd suggest to make
`num_values == sizeof(os::Linux::meminfo_t)/sizeof(ssize_t)`.
src/hotspot/os/linux/trimCHeapDCmd.cpp line 43:
> 41: os::Linux::meminfo_t info2;
> 42: bool have_info1 = os::Linux::query_process_memory_info(&info1);
> 43:
os::Linux::meminfo_t info1;
os::Linux::meminfo_t info2;
// Query memory before...
bool have_info1 = os::Linux::query_process_memory_info(&info1);
Looks better to me :)
src/hotspot/os/linux/trimCHeapDCmd.cpp line 64:
> 62: ss_report.print_cr("Swap before: " SSIZE_FORMAT "k, after: "
> SSIZE_FORMAT "k, (" SSIZE_FORMAT "k)",
> 63: info1.vmswap, info2.vmswap, (info2.vmswap -
> info1.vmswap));
> 64: }
In the unusual case that `(have_info1 && have_info2)==true` but there was still
nothing printed (i.e. `info1.vmsize == -1 || info2.vmsize == -1` etc.), you
should still print out `"No details available."`.
src/hotspot/share/services/diagnosticCommand.cpp line 99:
> 97: DCmdFactory::register_DCmdFactory(new
> DCmdFactoryImpl<HeapInfoDCmd>(full_export, true, false));
> 98: DCmdFactory::register_DCmdFactory(new
> DCmdFactoryImpl<FinalizerInfoDCmd>(full_export, true, false));
> 99: LINUX_ONLY(DCmdFactory::register_DCmdFactory(new
> DCmdFactoryImpl<TrimCLibcHeapDCmd>(full_export, true, false));)
Not sure if the commands are really sorted and if order is important here.
Otherwise you could add the new command a few lines later where there already
exists an `#ifdef LINUX` section:
#ifdef LINUX
DCmdFactory::register_DCmdFactory(new
DCmdFactoryImpl<PerfMapDCmd>(full_export, true, false));
DCmdFactory::register_DCmdFactory(new
DCmdFactoryImpl<TrimCLibcHeapDCmd>(full_export, true, false));
#endif // LINUX
test/hotspot/jtreg/serviceability/dcmd/vm/TrimLibcHeapTest.java line 46:
> 44: output.reportDiagnosticSummary();
> 45: output.shouldMatch("(Done|Not available)");
> 46: }
Maybe also add something like:
if (output.firstMatch("Done") != null) {
output.shouldMatch("(Virtual size before|RSS before|Swap before|No details
available)");
}
-------------
Changes requested by simonis (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/4510