On April 28, 2025 7:25:17 PM PDT, Andrew Cooper <andrew.coop...@citrix.com> 
wrote:
>On 29/04/2025 3:00 am, H. Peter Anvin wrote:
>> On April 28, 2025 5:12:13 PM PDT, Andrew Cooper <andrew.coop...@citrix.com> 
>> wrote:
>>> On 28/04/2025 10:38 pm, H. Peter Anvin wrote:
>>>> On April 28, 2025 9:14:45 AM PDT, Linus Torvalds 
>>>> <torva...@linux-foundation.org> wrote:
>>>>> On Mon, 28 Apr 2025 at 00:05, Ingo Molnar <mi...@kernel.org> wrote:
>>>>>> And once we remove 486, I think we can do the optimization below to
>>>>>> just assume the output doesn't get clobbered by BS*L in the zero-case,
>>>>>> right?
>>>>> We probably can't, because who knows what "Pentium" CPU's are out there.
>>>>>
>>>>> Or even if Pentium really does get it right. I doubt we have any
>>>>> developers with an original Pentium around.
>>>>>
>>>>> So just leave the "we don't know what the CPU result is for zero"
>>>>> unless we get some kind of official confirmation.
>>>>>
>>>>>          Linus
>>>> If anyone knows for sure, it is probably Christian Ludloff. However, there 
>>>> was a *huge* tightening of the formal ISA when the i686 was introduced 
>>>> (family=6) and I really believe this was part of it.
>>>>
>>>> I also really don't trust that family=5 really means conforms to 
>>>> undocumented P5 behavior, e.g. for Quark.
>>> https://www.sandpile.org/x86/flags.htm
>>>
>>> That's a lot of "can't even characterise the result" in the P5.
>>>
>>> Looking at P4 column, that is clearly what the latest SDM has
>>> retroactively declared to be architectural.
>>>
>>> ~Andrew
>> Yes, but it wasn't about flags here. 
>>
>> Now, question: can we just use __builtin_*() for these? I think gcc should 
>> always generate inline code for these on x86.
>
>Yes it does generate inline code.  https://godbolt.org/z/M45oo5rqT
>
>GCC does it branchlessly, but cannot optimise based on context.
>
>Clang can optimise based on context, except the 0 case it seems.
>
>Moving to -march=i686 causes both GCC and Clang to switch to CMOV and
>create branchless code, but is still GCC still can't optimise out the
>CMOV based on context.
>
>~Andrew

Maybe a gcc bug report would be better than trying to hack around this in the 
kernel?

Reply via email to