On Wed, May 20, 2020 at 12:17:15PM +0200, Jan Beulich wrote:
> On 20.05.2020 11:31, Roger Pau Monné wrote:
> > On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote:
> >> On 14.05.2020 16:05, Roger Pau Monné wrote:
> >>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote:
> >>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c
> >>>>   efi/mkreloc: efi/mkreloc.c
> >>>>          $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $<
> >>>>   
> >>>> +nocov-y += hweight.o
> >>>> +noubsan-y += hweight.o
> >>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg))
> >>>
> >>> Why not use clobbers in the asm to list the scratch registers? Is it
> >>> that much expensive?
> >>
> >> The goal is to disturb the call sites as little as possible. There's
> >> no point avoiding the scratch registers when no call is made (i.e.
> >> when the POPCNT insn can be used). Taking away from the compiler 7
> >> out of 15 registers (that it can hold live data in) seems quite a
> >> lot to me.
> > 
> > IMO using -ffixed-reg for all those registers is even worse, as that
> > prevents the generated code in hweight from using any of those, thus
> > greatly limiting the amount of registers and likely making the
> > generated code rely heavily on pushing an popping from the stack?
> 
> Okay, that's the other side of the idea behind all this: Virtually no
> hardware we run on will lack POPCNT support, hence the quality of
> these fallback routines matters only the very old hardware, where we
> likely don't perform optimally already anyway.
> 
> > This also has the side effect to limiting the usage of popcnt to gcc,
> > which IMO is also not ideal.
> 
> Agreed. I don't know enough about clang to be able to think of
> possible alternatives. In any event there's no change to current
> behavior for hypervisors built with clang.
> 
> > I also wondered, since the in-place asm before patching is a call
> > instruction, wouldn't inline asm at build time already assume that the
> > scratch registers are clobbered?
> 
> That would imply the compiler peeks into the string literal of the
> asm(). At least gcc doesn't, and even if it did it couldn't infer an
> ABI from seeing a CALL insn.

Please bear with me, but then I don't understand what Linux is doing
in arch/x86/include/asm/arch_hweight.h. I see no clobbers there,
neither it seems like the __sw_hweight{32/64} functions are built
without the usage of the scratch registers.

Roger.

Reply via email to