Re: Porting request for JEP Reduce Latency of G1 Write Barriers

2024-11-08 Thread Thomas Schatzl

Hi all,

  fwiw, here's the current branch with these changes so you can have a 
look:


https://github.com/tschatzl/jdk/tree/submit/8342382-card-table-instead-of-dcq 



I do not think there will be any significant changes in the design 
(assuming you and other reviewers also like it :P). Aarch64 and x86(-64) 
are fine imo (for months now...), all but S390 need more testing.


As for the S390 port, I admit I gave up getting a cross-compilation and 
emulator set up for S390, and its (macro-)assembly seemed so much 
different that the other ports, and pushing it through GHA repeatedly 
until it at least compiles takes too much time too.


There is a big FIXME in its generate_post_barrier_fast_path() in 
g1BarrierSetAssembler_s390.cpp file; other than that method 
implementation, only cleanup/removal of unnecessary code like for the 
other ports should be required.


All other (very few) FIXMEs are about removing debug log messages or 
code that can be removed when all ports are done.


The reason for the delay for posting the PR is that the JEP text needs 
some more refactoring until we can submit and post the official PR (and 
there is some time until the JDK 24 fork anyway).


The implementation CR also already contains some more information about 
the implemetation (https://bugs.openjdk.org/browse/JDK-8342382), which 
will also be updated a little before asking for reviews.


Hth,
  Thomas

On 22.10.24 10:26, Thomas Schatzl wrote:

Hi all,

   we in the GC team have been experimenting with new post-write 
barriers for G1 to reduce the throughput difference between G1 and 
Parallel GC.


After quite a few attempts we think we found a balanced solution between 
overall complexity and impact on throughput and latency.


The JEP draft in [1] summarizes the new write barrier and other relevant 
changes; feel free to also chime in in the a public discussion thread 
[2] if there are any questions.


Summarizing, throughput is always better or the same as before, 
sometimes getting very close to Parallel GC. There are no changes to 
latency with pause times being a bit better at the cost of some native 
memory.


This email is mostly an advance notice that unfortunately the new write 
barrier is incompatible with the old one, so effort is required to make 
G1 work again on all platforms. Hence we would like to ask you to help 
us with them.


We intend to post a PR containing the implementation in the next (one or 
two) weeks (for this CR [3]); our goal would be to merge sometime for 
(early?) JDK 25 together with all barrier changes completed.


At Oracle we will as usual fully support the x64_64 and aarch64 platforms.

There are (seemingly) working implementations for riscv, aarch32/arm, 
and x86, which would need further testing/checking, but completely 
missing are barriers for PPC and S390 (I tried my luck with those, but 
gave up a after a bit ...).


Thanks in advance,
   Thomas

[1] https://bugs.openjdk.org/browse/JDK-8342382
[2] 
https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-October/049944.html

[3] https://bugs.openjdk.org/browse/JDK-8342382




Re: Porting request for JEP Reduce Latency of G1 Write Barriers

2024-11-08 Thread Thomas Schatzl

Hi all,

  fwiw, here's the current branch with these changes so you can have a 
look:


https://github.com/tschatzl/jdk/tree/submit/8342382-card-table-instead-of-dcq 



I do not think there will be any significant changes in the design 
(assuming you and other reviewers also like it :P). Aarch64 and x86(-64) 
are fine imo (for months now...), all but S390 need more testing.


As for the S390 port, I admit I gave up getting a cross-compilation and 
emulator set up for S390, and its (macro-)assembly seemed so much 
different that the other ports, and pushing it through GHA repeatedly 
until it at least compiles takes too much time too.


There is a big FIXME in its generate_post_barrier_fast_path() in 
g1BarrierSetAssembler_s390.cpp file; other than that method 
implementation, only cleanup/removal of unnecessary code like for the 
other ports should be required.


All other (very few) FIXMEs are about removing debug log messages or 
code that can be removed when all ports are done.


The reason for the delay for posting the PR is that the JEP text needs 
some more refactoring until we can submit and post the official PR (and 
there is some time until the JDK 25 fork anyway).


The implementation CR also already contains some more information about 
the implemetation (https://bugs.openjdk.org/browse/JDK-8342382), which 
will also be updated a little before asking for reviews.


Hth,
  Thomas

On 22.10.24 10:26, Thomas Schatzl wrote:

Hi all,

   we in the GC team have been experimenting with new post-write 
barriers for G1 to reduce the throughput difference between G1 and 
Parallel GC.


After quite a few attempts we think we found a balanced solution between 
overall complexity and impact on throughput and latency.


The JEP draft in [1] summarizes the new write barrier and other relevant 
changes; feel free to also chime in in the a public discussion thread 
[2] if there are any questions.


Summarizing, throughput is always better or the same as before, 
sometimes getting very close to Parallel GC. There are no changes to 
latency with pause times being a bit better at the cost of some native 
memory.


This email is mostly an advance notice that unfortunately the new write 
barrier is incompatible with the old one, so effort is required to make 
G1 work again on all platforms. Hence we would like to ask you to help 
us with them.


We intend to post a PR containing the implementation in the next (one or 
two) weeks (for this CR [3]); our goal would be to merge sometime for 
(early?) JDK 25 together with all barrier changes completed.


At Oracle we will as usual fully support the x64_64 and aarch64 platforms.

There are (seemingly) working implementations for riscv, aarch32/arm, 
and x86, which would need further testing/checking, but completely 
missing are barriers for PPC and S390 (I tried my luck with those, but 
gave up a after a bit ...).


Thanks in advance,
   Thomas

[1] https://bugs.openjdk.org/browse/JDK-8342382
[2] 
https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-October/049944.html

[3] https://bugs.openjdk.org/browse/JDK-8342382