Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C

Linus Torvalds Tue, 29 Apr 2025 15:35:06 -0700

On Tue, 29 Apr 2025 at 15:22, Andrew Cooper <andrew.coop...@citrix.com> wrote:
>
> Oh, I didn't realise there was also a perf difference too, but Agner Fog
> agrees.


The perf difference is exactly because of the issue where the non-rep
one acts as a cmov, and has basically two inputs (the bits to test in
the source, and the old value of the result register)

I guess it's not "fundamental", but lzcnt is basically a bit simpler
for hardware to implement, and the non-rep legacy bsf instruction
basically has a dependency on the previous value of the result
register.

So even when it's a single uop for both cases, that single uop can be
slower for the bsf because of the (typically false) dependency and
extra pressure on the rename registers.

       Linus

Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C

Reply via email to