The balance between the hottest locations in the decompressor code
varies depending on the input file. Linux kernel source compresses very
well (ratio is about 0.10). This reduces the benefit of branchless
code. On my main computer I still get about 2 % time reduction with =3.

On another x86-64 computer I don't see any difference between =0 and =3
with the Linux kernel source. On the same machine, decompression time
of warzone2100-data[1] from Debian is reduced by 10.5 % with =3 compared
to =0. It's a package that doesn't compress so well (ratio is about
0.75). On my main computer the time reduction from =0 to =3 is 8.5 %.
All numbers are with GCC.

Of course, on x86-64 the =0 vs. =3 test isn't that interesting since the
asm is so much better. But this highlights how much the test file
choice can make a difference.

[1] https://packages.debian.org/bookworm/all/warzone2100-data/download

-- 
Lasse Collin

Reply via email to