On Thu, 2 Feb 2023 12:54:06 GMT, Boris Ulasevich <[email protected]> wrote:
> This is another pull request to replace > https://github.com/openjdk/jdk/pull/10025 change which was blocked as not > acceptable (see > https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to > learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of > data > - statistics of the debug info data can (will) change, breaking the > optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the > existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n > Proto > > With this change I propose a different approach. Instead of bit coding, the > sequence of zeros in a data stream is encoded with a special character that > normally never appears in Unsinged5 encoding, followed by a byte containing a > number of zeros. In this way the updated algorithm is a pure extension of the > existing encoding algorithm: data encoded without the zero-reduction trick is > unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies > info, OopMap info, LineNumber info, Debug info. Only Debug info has a large > number of zeros and gets a significant benefit. I experimented with the Cap'n > Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better > compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the > application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any > difference. src/hotspot/share/code/compressedStream.hpp line 118: > 116: bool handle_zero(juint value) { > 117: if (value == 0) { > 118: _zero_count = (_zero_count == 0xFF) ? 0 : _zero_count; The case of `_zero_count` overflow is not clear. Apparently, I'm missing something here. Current code is just clearing the previously counted `_zero_count`. I'd expect some action like storing the current number of zeros or advancing the `_position`. Do you have a test for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12387#discussion_r1254988216
