On 26/03/2025 9:20 am, Jan Beulich wrote: > On 25.03.2025 19:52, Andrew Cooper wrote: >> It turns out that LZCNT/TZCNT have the same input dependent bug, prior to >> Skylake. > These two do, but BSF/BSR don't? Pretty odd.
BSF/BSR have true input dependencies. They both have cases where the destination register is left unmodified. > >> --- a/xen/arch/x86/include/asm/bitops.h >> +++ b/xen/arch/x86/include/asm/bitops.h >> @@ -488,10 +488,16 @@ static always_inline unsigned int >> arch_hweightl(unsigned long x) >> * >> * This limits the POPCNT instruction to using the same ABI as a >> function >> * call (input in %rdi, output in %eax) but that's fine. >> + * >> + * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a >> false >> + * input dependency on it's destination register (errata HSD146, SKL029 >> + * amongst others), impacting loops such as bitmap_weight(). Insert an >> + * XOR to manually break the dependency. >> */ > With this being an Intel-only issue, wouldn't we better ... > >> alternative_io("call arch_generic_hweightl", >> + "xor %k[res], %k[res]\n\t" > ... put this line in #ifdef CONFIG_INTEL then? The extra overhead is small, > but > I see no reason not to avoid it if we can. (I realize that's not quite as > straightforward as it reads, for alternative_io() being a macro.) For an XOR, no; not worth the complexity. Besides, this gets used a whole 5 locations in the hypervisor, after I cleaned up the paths which shouldn't have been using hweight() in the first place. ~Andrew