On Tue, 29 Apr 2025 at 14:24, H. Peter Anvin <h...@zytor.com> wrote:
>
> Could you file a gcc bug? Gcc shouldn't generate worse code than inline asm 
> ...

Honestly, I've given up on that idea.

That's basically what the bug report I reported about clts was - gcc
generating worse code than inline asm.

But why would we use gcc builtins at all if we have inline asm that is
better than what som eversions of gcc generates? The inline asm works
for all versions.

Anyway, attached is a test file that seems to generate basically
"optimal" code. It's not a kernel patch, but a standalone C file for
testing with a couple of stupid test-cases to make sure that it gets
the constant argument case right, and that it optimizes the "I know
it's not zero" case.

That is why it also uses that "#if __SIZEOF_LONG__ == 4" instead of
something like CONFIG_64BIT.

So it's a proof-of-concept thing, and it does seem to be fairly simple.

The "important" parts are just the

  #define variable_ffs(x) \
        (statically_true((x)!=0) ? __ffs(x)+1 : do_variable_ffs(x))
  #define constant_ffs(x) __builtin_ffs(x)
  #define ffs(x) \
        (__builtin_constant_p(x) ? constant_ffs(x) : variable_ffs(x))

lines which end up picking the right choice. The rest is either the
"reimplement the kernel infrastructure for testing" or the trivial
do_variable_ffs() thing.

I just did

    gcc -m32 -O2 -S -fomit-frame-pointer t.c

(with, and without that -m32) and looked at the result to see that it
looks sane. No *actual* testing.


                Linus
static inline unsigned int __ffs(unsigned int x)
{
	unsigned int ret;
	asm("rep bsfl %1,%0"
		:"=r" (ret)
		:"rm" (x));
	return ret;
}
#define statically_true(x) (__builtin_constant_p(x) && (x))

static inline unsigned int do_variable_ffs(unsigned int x)
{
#if __SIZEOF_LONG__ == 4
	return __builtin_ffs(x);
#else
	unsigned int ret;
	asm("rep bsfl %1,%0"
		:"=r" (ret)
		:"rm" (x), "0" (-1));
	return ret+1;
#endif
}

#define variable_ffs(x) (statically_true((x)!=0) ? __ffs(x)+1 : do_variable_ffs(x))
#define constant_ffs(x) __builtin_ffs(x)

#define ffs(x) (__builtin_constant_p(x) ? constant_ffs(x) : variable_ffs(x))

unsigned int myffs(unsigned int x) { return ffs(x); }
unsigned int ffs5(void) { return ffs(5); }
unsigned int addffs(int x)
{
	unsigned int sum = 0;
	while (x) {
		sum += ffs(x);
		x--;
	}
	return sum;
}

Reply via email to