The balance between the hottest locations in the decompressor code varies depending on the input file. Linux kernel source compresses very well (ratio is about 0.10). This reduces the benefit of branchless code. On my main computer I still get about 2 % time reduction with =3.
On another x86-64 computer I don't see any difference between =0 and =3 with the Linux kernel source. On the same machine, decompression time of warzone2100-data[1] from Debian is reduced by 10.5 % with =3 compared to =0. It's a package that doesn't compress so well (ratio is about 0.75). On my main computer the time reduction from =0 to =3 is 8.5 %. All numbers are with GCC. Of course, on x86-64 the =0 vs. =3 test isn't that interesting since the asm is so much better. But this highlights how much the test file choice can make a difference. [1] https://packages.debian.org/bookworm/all/warzone2100-data/download -- Lasse Collin