On Tue, 3 Jun 2025 07:16:47 GMT, Radim Vansa <rva...@openjdk.org> wrote:
>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 >> trying to reduce the performance regression in some scenarios introduced in >> https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and >> memory consumption it is a (better) alternative to >> https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; >> rather than sequentially iterating through all fields during lookup we sort >> the fields based on the field name. The stream includes extra table after >> the field information: for field at position 16, 32 ... we record the >> (variable-length-encoded) offset of the field info in this stream. On field >> lookup, rather than iterating through all fields, we iterate through this >> table, resolve names for given fields and continue field-by-field iteration >> only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 >> byte that was left with value 0 at the end of stream. In classes with > 16 >> fields we add extra 4 bytes with offset of the table, and the table contains >> one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ± σ): 51.3 ms ± 2.8 ms [User: 44.7 ms, System: 13.7 >> ms] >> Range (min … max): 45.1 ms … 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ± σ): 78.2 ms ± 1.0 ms [User: 74.6 ms, System: 17.3 >> ms] >> Range (min … max): 73.8 ms … 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ± σ): 38.5 ms ± 0.5 ms [User: 34.4 ms, System: 17.3 >> ms] >> Range (min … max): 37.7 ms … 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to >> previous levels, this PR improves it by 25% compared to JDK 17 (which does >> not contain the regression)! This time, the undisclosed production-grade >> reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with three additional > commits since the last revision: > > - Moved jtreg test > - Improved documentation > - Fix coding style (asterisk placement) > Reading further, I see what this mapping is and intentionally generalized. I > guess a comment like, the key and value are sized to represent the maximum > value for each and then compacted, or something like that. But maybe I > haven't figured out the packing. Are they increments of u1, u2 or u4 or > something in between? >From the constructor that accepts the maximum number for each we figure out >the number of bits required to store those numbers. Imagine that only as bits >(not aligned to byte boundary... yet). Then we concatenate the bits for key >and value, and then 'add' 1-7 padding zeroes (high-order bits) to align on >bytes. So in the end we have each element in the table consuming 1-8 bytes >(case with 0 bits for both key and value is ruled out). In case of the fields >search table, the key will be at most the stream length (which in turn has at >most 65536 fields and each occupies up to 8 varints which is <= 40 bytes, >though normally far less) and value is at most 65535 (# fields in class) so >this is 1-5 bytes. I thought that the presence of `max_key` and `max_value` in constructors along with the comment on `PackedTableBase` saying > Each element consists of **up to** 32-bit key, and **up to** 32-bit value; > these are **packed into a bit-record** with 1-byte alignment. pretty much says that - shall I place extra comments somewhere else? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2946329561