On Thu, 17 Apr 2025 12:59:56 GMT, Radim Vansa <rva...@openjdk.org> wrote:
>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 >> trying to reduce the performance regression in some scenarios introduced in >> https://bugs.openjdk.org/browse/JDK-8292818 >> >> Iteration through the field stream of several variable-length-encoded items >> is limited by inherent dependency of fully decoding the field before knowing >> the position of next field. On the critical codepath addressed by this PR, >> only the name and signature indices from are used though. >> Inspired by the Varint-GB and Varint-G81U encodings, the idea of this change >> is in adding a control stream to the stream of field infos; each field uses >> one byte of this control stream, and currently the lower 6 bits encode the >> length of the encoded field, allowing more efficient skipping through the >> stream. The most significant bit duplicates the injected field flag >> (name/signature lookup requires that), one bit is currently unused. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ± σ): 51.3 ms ± 2.8 ms [User: 44.7 ms, System: 13.7 >> ms] >> Range (min … max): 45.1 ms … 53.9 ms 100 runs >> >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ± σ): 78.2 ms ± 1.0 ms [User: 74.6 ms, System: 17.3 >> ms] >> Range (min … max): 73.8 ms … 79.7 ms 100 runs >> >> (the `jdk25-master` above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/bin/java -cp /tmp CCC >> Time (mean ± σ): 51.8 ms ± 2.1 ms [User: 48.9 ms, System: 16.8 >> ms] >> Range (min … max): 47.4 ms … 55.1 ms 100 runs >> >> So in case of the synthetic reproducer we're already on the level of JDK 17. >> However, the undisclosed production-grade reproducer still shows regression, >> so there is still space for optimization. >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 3.1 s >> >> >> About the downsides: This PR increases the consumption by 1 byte for each >> field in loaded classes. I've executed some tests on a simple Spring Boot >> application with 6800 instance classes loaded -> about 16k fields in total, >> with NMT on; the memory usage (Class.Metadata.used) seems to have grown by >> 32kB; this discrepancy could be investigated later on.... > > Radim Vansa has updated the pull request incrementally with one additional > commit since the last revision: > > Make skip_bytes private Withdrawing in favor of https://github.com/openjdk/jdk/pull/24847 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24713#issuecomment-2827160936