On 26/03/2025 9:20 am, Jan Beulich wrote:
> On 25.03.2025 19:52, Andrew Cooper wrote:
>> It turns out that LZCNT/TZCNT have the same input dependent bug, prior to
>> Skylake.
> These two do, but BSF/BSR don't? Pretty odd.

BSF/BSR have true input dependencies.  They both have cases where the
destination register is left unmodified.

>
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -488,10 +488,16 @@ static always_inline unsigned int 
>> arch_hweightl(unsigned long x)
>>       *
>>       * This limits the POPCNT instruction to using the same ABI as a 
>> function
>>       * call (input in %rdi, output in %eax) but that's fine.
>> +     *
>> +     * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a 
>> false
>> +     * input dependency on it's destination register (errata HSD146, SKL029
>> +     * amongst others), impacting loops such as bitmap_weight().  Insert an
>> +     * XOR to manually break the dependency.
>>       */
> With this being an Intel-only issue, wouldn't we better ...
>
>>      alternative_io("call arch_generic_hweightl",
>> +                   "xor %k[res], %k[res]\n\t"
> ... put this line in #ifdef CONFIG_INTEL then? The extra overhead is small, 
> but
> I see no reason not to avoid it if we can. (I realize that's not quite as
> straightforward as it reads, for alternative_io() being a macro.)

For an XOR, no; not worth the complexity.

Besides, this gets used a whole 5 locations in the hypervisor, after I
cleaned up the paths which shouldn't have been using hweight() in the
first place.

~Andrew

Reply via email to