Dear All,

   I want to re-activate the thread of discussion about the implementation of 
parallel and incremental “Jmap -histo”.
   The target of these changes is to solve the problems that “jmap -histo” may 
“ timeout or killed by timer” when heap is large. And the result of “jmap 
-histo” is “one or nothing”, which means if it gets killed before exit, user 
gets no information about the heap.
   The “incremental” means that jmap -histo dumps the intermediate results when 
it is iterating the heap, so if it is interrupted, user can get some meaningful 
information.
   The “parallel” targets to help speed up the heap iteration with 
multi-threading.

   Originally I have implemented the “incremental dump” that dump the 
intermediate data into a separate file like <IncrementalHisto.dump>, and the 
final result will be saved to another file <HistoResult.dump>. so when jmap 
-histo get interrupted, user can get information from <IncrementalHisto.dump>, 
and if jmap -histo works fine, the final result would be in <HistoResult.dump>.

   And the parallel dump will have multiple thread working on heap iteration, 
each thread generates intermediate data timely.

   The main reason of using separate file for incremental dump is due to the 
consideration of parallel incremental dump implementation, so that every 
heap-iteration thread could dump its own data in separate file, to avoid using 
file lock.

   However, it seems that the original design might confuse user by having two 
or more result files (intermediated result and final result).  So I want to ask 
your help to discuss it:


  1.  For incremental dump without parallel, Intermediate result and the final 
result are dumped to the same file:

In this case, the intermediate data are generated in the middle of heap 
iteration, they are written to file <HistoResult.dump> at the same time. And if 
jmap -histo exits normally, the final result will be also dump to 
<HistoResult.dump>, then all intermediate data are flushed.



  1.  For parallel dump without incremental:
Every thread generates its own thread-local dump buffer, and all thread local 
dump are merged and write to the <HistoResult.dump> file at the end.
There is no incremental support, so the result is “one or nothing”.


  1.  For parallel + incremental dump, I think it’s a little complicated 
because of intermediate data processing:

     *   Every thread has its own thread-local intermediate data buffer, and 
all the thread-local buffers will be written to <HistoResult.dump> file while 
holding file lock. So there is only one data file generated, and if jmap -histo 
is interrupted,  the intermediated data are save in the same file.

The problem is that the file write lock can be heavy, which may cause parallel 
heap dump slow.



     *   Every thread has its own thread-local intermediate data buffer, and 
every thread save its result in an temp file named 
<IntermediatedResult_[tid].dump>.

So there is no  file lock. The parallel can be fast. But the problem is that 
there will be multiple files generated to save the thread-local intermediate 
results. And this might confuse the user.



     *   Every thread has its own thread-local intermediate data buffer, and 
another “data-merging-thread” will be generated.

The parallel threads write data to its thread local buffer, and enqueue the 
buffer when data reach some threshold. The “data-merging-thread” consumes the 
queue, merge the data from different thread, save the merged data to the result 
file.

In this case, there is only one <HistoResult.dump> file generated. And there is 
no file lock needed, but there is queue lock, and a separate “merging thread” 
impl. Do you think this is a reasonable solution?

So may I ask your suggestion ?

Details of previous discussion can be found at 
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-June/028276.html

Thanks!

BRs,
Lin

Reply via email to