On Wed, 2 Aug 2023 14:33:28 GMT, Yi Yang <yy...@openjdk.org> wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase 
>> segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
>> file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> <p align="center">Fig1. Before</p>
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> <p align="center">Fig2. After this patch</p>
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | -------| ----------- | --------------- | --- | ---- |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   whitespace

OK, nearly done!

We always log "parallelism true" which looks odd.  It just means the dump path 
has space for up to 9,999 log parts.  
In VM_HeapDumper::doit() if we make it log whether it will try a parallel dump, 
based on all the checks, it simplifies things I think.

Also the variable num_requested_dump_thread is not needed? It's just a copy of 
_num_dumper_threads which we don't change.

Here is a suggestion for VM_HeapDumper::doit()
...
  WorkerThreads* workers = ch->safepoint_workers();
  uint num_active_workers = workers != nullptr ? workers->active_workers() : 0;
  bool is_parallel = num_active_workers > 0 && can_parallel_dump() && 
is_parallel_dump();
  log_info(heapdump)("Requested dump threads %u, active workers %u, parallelism 
%s",
                      _num_dumper_threads, num_active_workers, is_parallel ? 
"true" : "false");

  if (!is_parallel) {
    _num_dumper_threads = 1;
    work(VMDumperWorkerId);
  } else {
    _num_dumper_threads = clamp(_num_dumper_threads, 2U, num_active_workers);
    uint heap_only_dumper_threads = _num_dumper_threads - 1 /* VMDumper thread 
*/;
    _dumper_controller = new (std::nothrow) 
DumperController(heap_only_dumper_threads);
    ParallelObjectIterator poi(_num_dumper_threads);
    _poi = &poi;
    workers->run_task(this, _num_dumper_threads);
    _poi = nullptr;
  }
  
  
-----------------

HeapDumpParallelTest.java
We need to change all of the:
  Asserts.assertTrue(out.getStdout().contains(...
to:
  out.shouldContain(...

...then we get the output included in the log if it fails, so we should see 
what went wrong.

----------------------

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1663979913

Reply via email to