On 29/04/2025 4:13 am, H. Peter Anvin wrote: > On April 28, 2025 7:25:17 PM PDT, Andrew Cooper <andrew.coop...@citrix.com> > wrote: >> On 29/04/2025 3:00 am, H. Peter Anvin wrote: >>> On April 28, 2025 5:12:13 PM PDT, Andrew Cooper <andrew.coop...@citrix.com> >>> wrote: >>>> On 28/04/2025 10:38 pm, H. Peter Anvin wrote: >>>>> On April 28, 2025 9:14:45 AM PDT, Linus Torvalds >>>>> <torva...@linux-foundation.org> wrote: >>>>>> On Mon, 28 Apr 2025 at 00:05, Ingo Molnar <mi...@kernel.org> wrote: >>>>>>> And once we remove 486, I think we can do the optimization below to >>>>>>> just assume the output doesn't get clobbered by BS*L in the zero-case, >>>>>>> right? >>>>>> We probably can't, because who knows what "Pentium" CPU's are out there. >>>>>> >>>>>> Or even if Pentium really does get it right. I doubt we have any >>>>>> developers with an original Pentium around. >>>>>> >>>>>> So just leave the "we don't know what the CPU result is for zero" >>>>>> unless we get some kind of official confirmation. >>>>>> >>>>>> Linus >>>>> If anyone knows for sure, it is probably Christian Ludloff. However, >>>>> there was a *huge* tightening of the formal ISA when the i686 was >>>>> introduced (family=6) and I really believe this was part of it. >>>>> >>>>> I also really don't trust that family=5 really means conforms to >>>>> undocumented P5 behavior, e.g. for Quark. >>>> https://www.sandpile.org/x86/flags.htm >>>> >>>> That's a lot of "can't even characterise the result" in the P5. >>>> >>>> Looking at P4 column, that is clearly what the latest SDM has >>>> retroactively declared to be architectural. >>>> >>>> ~Andrew >>> Yes, but it wasn't about flags here. >>> >>> Now, question: can we just use __builtin_*() for these? I think gcc should >>> always generate inline code for these on x86. >> Yes it does generate inline code. https://godbolt.org/z/M45oo5rqT >> >> GCC does it branchlessly, but cannot optimise based on context. >> >> Clang can optimise based on context, except the 0 case it seems. >> >> Moving to -march=i686 causes both GCC and Clang to switch to CMOV and >> create branchless code, but is still GCC still can't optimise out the >> CMOV based on context. >> >> ~Andrew > Maybe a gcc bug report would be better than trying to hack around this in the > kernel?
I tried that. (The thread started as a question around __builtin_constant_p() but did grow to cover __builtin_ffs().) https://gcc.gnu.org/pipermail/gcc/2024-March/243465.html ~Andrew