On Thu, 17 Apr 2025 12:59:56 GMT, Radim Vansa <rva...@openjdk.org> wrote:

>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 
>> trying to reduce the performance regression in some scenarios introduced in 
>> https://bugs.openjdk.org/browse/JDK-8292818
>> 
>> Iteration through the field stream of several variable-length-encoded items 
>> is limited by inherent dependency of fully decoding the field before knowing 
>> the position of next field. On the critical codepath addressed by this PR, 
>> only the name and signature indices from are used though.
>> Inspired by the Varint-GB and Varint-G81U encodings, the idea of this change 
>> is in adding a control stream to the stream of field infos; each field uses 
>> one byte of this control stream, and currently the lower 6 bits encode the 
>> length of the encoded field, allowing more efficient skipping through the 
>> stream. The most significant bit duplicates the injected field flag 
>> (name/signature lookup requires that), one bit is currently unused.
>> 
>> My measurements on the attached reproducer
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      51.3 ms ±   2.8 ms    [User: 44.7 ms, System: 13.7 
>> ms]
>>   Range (min … max):    45.1 ms …  53.9 ms    100 runs
>> 
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      78.2 ms ±   1.0 ms    [User: 74.6 ms, System: 17.3 
>> ms]
>>   Range (min … max):    73.8 ms …  79.7 ms    100 runs
>> 
>> (the `jdk25-master` above already contains JDK-8353175)
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-this-pr/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      51.8 ms ±   2.1 ms    [User: 48.9 ms, System: 16.8 
>> ms]
>>   Range (min … max):    47.4 ms …  55.1 ms    100 runs
>> 
>> So in case of the synthetic reproducer we're already on the level of JDK 17. 
>> However, the undisclosed production-grade reproducer still shows regression, 
>> so there is still space for optimization.
>> 
>> JDK 17: 1.6 s
>> JDK 21 (no patches): 22 s
>> JDK25-master: 12.3 s
>> JDK25-this-pr: 3.1 s
>> 
>> 
>> About the downsides: This PR increases the consumption by 1 byte for each 
>> field in loaded classes. I've executed some tests on a simple Spring Boot 
>> application with 6800 instance classes loaded -> about 16k fields in total, 
>> with NMT on; the memory usage (Class.Metadata.used) seems to have grown by 
>> 32kB; this discrepancy could be investigated later on....
>
> Radim Vansa has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Make skip_bytes private

Withdrawing in favor of https://github.com/openjdk/jdk/pull/24847

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24713#issuecomment-2827160936

Reply via email to