On 02/03/18 07:10, Jan Beulich wrote:
>>>> On 01.03.18 at 17:58, <andrew.coop...@citrix.com> wrote:
>> On 01/03/18 10:54, Jan Beulich wrote:
>>>>>> On 01.03.18 at 11:36, <roger....@citrix.com> wrote:
>>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>>>> Andrew Cooper <andrew.coop...@citrix.com> 02/28/18 7:20 PM >>>
>>>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>>>> On 26.02.18 at 12:35, <andrew.coop...@citrix.com> wrote:
>>>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>> +#include <asm/nops.h>
>>>>>>>> +
>>>>>>>>  #ifdef __ASSEMBLY__
>>>>>>>>  /*
>>>>>>>> @@ -18,6 +20,14 @@
>>>>>>>>      .byte \pad_len
>>>>>>>>  .endm
>>>>>>>> +.macro mknops nr_bytes
>>>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>>>> NOP?
>>>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>>>> because of the associated decode stall on almost all hardware.  Using
>>>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>>>> consistent, irrespective of toolchain support.
>>>>> I don't understand - an earlier patch takes care of runtime replacing them
>>>>> anyway. What stalls can then result?
>>>> The runtime replacement won't happen when using the .nops directive
>>>> AFAICT, because the original padding section will likely be filled
>>>> with opcodes different than 0x90, and thus the runtime nop
>>>> optimization won't be performed.
>>> Oh, indeed. That puts under question the whole idea of using
>>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>>> to withdraw my ack.
>>>> I also agree that using the default (not proving a second argument)
>>>> seems like a better solution. Why would the toolstack switch to
>>>> something that leads to worse performance? That would certainly be
>>>> considered a bug.
>>> Why? They may change it based on data available for newer /
>>> older / whatever hardware. Any build-time choice is going to be
>>> suboptimal somewhere, so I think we absolutely should not
>>> bypass runtime replacing these NOPs, the more that now we
>>> may have quite large sequences of them.
>> The pont of having the toolchain put out optimised nops is to avoid the
>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>> set of toolchain nops defeats the purpose in the overwhelming common
>> case of running on a system which prefers P6 nops.
>> The problem of working out when to optimise is that, when we come to
>> apply an individual alternative, we don't know if we've already patched
>> this site before.  Even the unoptimised algorithm has a corner case
>> which explodes, if there is a stream of 0x90's on the end of a
>> replacement e.g. in a imm or disp field.
>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>> has been patched or not (other than keeping a full copy of the origin
>> site as a reference).  As soon as we chose to optimise the nops of the
>> origin site, we cannot determine anything at all.
>> Thinking out loud, we could perhaps have a section containing one byte
>> per origin site, which we use to track whether we've already optimised
>> the padding bytes, and whether the contents have been replaced.  This
>> would also add an extra long into struct altentry, but its all cold data
>> after boot.
> What about alternatively simply updating the struct alt_instr
> instances to describe the code _after_ a patch that was applied?
> That'll allow to always know how much padding there is.

There are multiple alt_instr pointing to the same origin sites when
using ALTERNATIVE_2, meaning you keep all the safety problems with the
current setup.


Xen-devel mailing list

Reply via email to