On Tue, 23 Feb 2021 08:06:14 GMT, Ralf Schmelter <rschmel...@openjdk.org> wrote:
>> Hi @schmelter-sap, >> Thanks a lot for reviewing and benchmarking. >> >>> I've benchmarked the code on my machine (128GB memory, 56 logical CPUs) >>> with an example creating a 32 GB heap dump. I only saw a 10 percent >>> reduction in time, both using uncompressed and compressed dumps. Have you >>> seen better numbers in your benchmarks? >>> >>> And it seems to potentially use a lot more temporary memory. In my example >>> I had a 4 GB array in the heap and the new code allocated 4 GB of >>> additional memory to write this array. This could happen in more threads in >>> parallel, increasing the memory consumption even more. >> >> I have done some preliminary test on my machine (16GB, 8 core), the data are >> shown as follow: >> `$ jmap -dump:file=dump4.bin,parallel=4 127420` >> `Dumping heap to /home/lzang1/Source/jdk/dump4.bin ...` >> `Heap dump file created [932950649 bytes in 0.591 secs]` >> `$ jmap -dump:file=dump1.bin,parallel=1 127420` >> `Dumping heap to /home/lzang1/Source/jdk/dump1.bin ...` >> `Heap dump file created [932950739 bytes in 2.957 secs]` >> >> But I do have observed unstable data reported on a machine with more cores >> and larger RAM, plus a workload with more heap usage. I thought that may be >> related with the memory consumption as you mentioned. And I am investigating >> the way to optimize it. >> >>> If the above problems could be fixed, I would suggest to just use the >>> parallel code in all cases. >> >> Thanks a lot! I will let you know when I make some progress on optimization. >> >> BRs, >> Lin > > Hi @linzang, > > I've done more benchmarking using different numbers of threads for parallel > heap iteration and have found values which give at least a factor of 2 > speedup (for gzipped dumps) or 1.6 (for unzipped dumps). For my scenario > using gzip compression about 10 percent of the available CPUs for parallel > iteration gave the best speedup, for the uncompressed one it was about 7 > percent. > > Note that the baseline I compared against was not the parallel=1 case, but > the old code. The parallel=1 case was always 10 to 20 percent slower than the > old code. > > Best regards, > Ralf Dear @ralf, Really Thanks for benchmarking it! It is a little surprise to me that "parallel=1" is 10~20 percent slower than before. I believe this can be avoid with some revise in code. And I also found a potential memory leak in the implementation, WIP in fixing it. > I've done more benchmarking using different numbers of threads for parallel > heap iteration and have found values which give at least a factor of 2 > speedup (for gzipped dumps) or 1.6 (for unzipped dumps). For my scenario > using gzip compression about 10 percent of the available CPUs for parallel > iteration gave the best speedup, for the uncompressed one it was about 7 > percent. This data are really interest to me, it seems using gzipped dump is faster than unzipped dump, is the because of disk writing or something else? I would investigate more about it~ Thanks a lot! BRs, Lin ------------- PR: https://git.openjdk.java.net/jdk/pull/2261