On 18/01/2024 11:09 am, Jan Beulich wrote:
> On 18.01.2024 12:04, Andrew Cooper wrote:
>> On 17/01/2024 9:37 am, Jan Beulich wrote:
>>> --- a/xen/arch/x86/ioport_emulate.c
>>> +++ b/xen/arch/x86/ioport_emulate.c
>>> @@ -8,11 +8,10 @@
>>>  #include <xen/sched.h>
>>>  #include <xen/dmi.h>
>>>  
>>> -unsigned int (*__read_mostly ioemul_handle_quirk)(
>>> -    uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs);
>>> +bool __ro_after_init ioemul_handle_quirk;
>>>  
>>> -static unsigned int cf_check ioemul_handle_proliant_quirk(
>>> -    u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs)
>>> +unsigned int ioemul_handle_proliant_quirk(
>>> +    uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>>>  {
>>>      static const char stub[] = {
>>>          0x9c,       /*    pushf           */
>> Something occurred to me.
>>
>> diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
>> index 23cba842b22e..70f94febe255 100644
>> --- a/xen/arch/x86/ioport_emulate.c
>> +++ b/xen/arch/x86/ioport_emulate.c
>> @@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk;
>>  unsigned int ioemul_handle_proliant_quirk(
>>      uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>>  {
>> -    static const char stub[] = {
>> +    const char stub[] = {
>>          0x9c,       /*    pushf           */
>>          0xfa,       /*    cli             */
>>          0xee,       /*    out %al, %dx    */
>>
>> is an improvement, confirmed by bloat-o-meter:
>>
>> add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8)
>> Function                                     old     new   delta
>> ioemul_handle_proliant_quirk                  58      59      +1
>> stub                                           9       -      -9
>>
>> The reason is that we've got a 9 byte object that's decomposed into two
>> rip-relative accesses.  i.e. we've got more pointer than data in this case.
> I wouldn't mind this as a separate change, but I don't see how it would
> fit right here.

I'm not suggesting changing this patch.  I just linked here because I
noticed it because of this patch.

We've got similar patterns elsewhere, so I was intending to do a patch
covering all of them.

>
>> But this adjustment seems to tickle a GCC bug.  With that change in
>> place, GCC emits:
>>
>> <ioemul_handle_proliant_quirk>:
>>        48 83 ec 10             sub    $0x10,%rsp
>>        ...
>>        48 83 c4 10             add    $0x10,%rsp
>>        c3                      retq
>>
>> i.e. we get a stack frame (space at least, no initialisation) despite
>> the object having been converted entirely to instruction immediates.
>>
>> Or in other words, there's a further 12 byte saving available when GCC
>> can be persuaded to not even emit the stack frame.
>>
>> What is even more weird is that I see this GCC-10, and a build of gcc
>> master from last week, but not when I try to reproduce in
>> https://godbolt.org/z/PnachbznW so there's probably some other setting
>> used by Xen which tickles this bug.
> Yeah, I've observed such pointless frame allocation elsewhere as well,
> so far without being able what exactly triggers it.

Ok - more experimentation required, I guess.

~Andrew

Reply via email to