On 10/08/18 17:30, George Dunlap wrote:
> On 08/10/2018 05:00 PM, Jan Beulich wrote:
>>>>> On 10.08.18 at 17:35, <paul.durr...@citrix.com> wrote:
>>>>  -----Original Message-----
>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>>> Sent: 10 August 2018 16:31
>>>> To: Paul Durrant <paul.durr...@citrix.com>
>>>> Cc: Andrew Cooper <andrew.coop...@citrix.com>; xen-devel <xen-
>>>> de...@lists.xenproject.org>
>>>> Subject: RE: [Xen-devel] [PATCH 0/2] MMIO emulation fixes
>>>>
>>>>>>> On 10.08.18 at 17:08, <paul.durr...@citrix.com> wrote:
>>>>>>  -----Original Message-----
>>>>>> From: Andrew Cooper
>>>>>> Sent: 10 August 2018 13:56
>>>>>> To: Paul Durrant <paul.durr...@citrix.com>; 'Jan Beulich'
>>>>>> <jbeul...@suse.com>
>>>>>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>>>>>> Subject: Re: [Xen-devel] [PATCH 0/2] MMIO emulation fixes
>>>>>>
>>>>>> On 10/08/18 13:43, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>>>>>>> Sent: 10 August 2018 13:37
>>>>>>>> To: Paul Durrant <paul.durr...@citrix.com>
>>>>>>>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>>>>>>>> Subject: RE: [Xen-devel] [PATCH 0/2] MMIO emulation fixes
>>>>>>>>
>>>>>>>>>>> On 10.08.18 at 14:22, <paul.durr...@citrix.com> wrote:
>>>>>>>>>>  -----Original Message-----
>>>>>>>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>>>>>>>>> Sent: 10 August 2018 13:13
>>>>>>>>>> To: Paul Durrant <paul.durr...@citrix.com>
>>>>>>>>>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>>>>>>>>>> Subject: RE: [Xen-devel] [PATCH 0/2] MMIO emulation fixes
>>>>>>>>>>
>>>>>>>>>>>>> On 10.08.18 at 14:08, <paul.durr...@citrix.com> wrote:
>>>>>>>>>>>>  -----Original Message-----
>>>>>>>>>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>>>>>>>>>>> Sent: 10 August 2018 13:02
>>>>>>>>>>>> To: Paul Durrant <paul.durr...@citrix.com>
>>>>>>>>>>>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>>>>>>>>>>>> Subject: Re: [Xen-devel] [PATCH 0/2] MMIO emulation fixes
>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10.08.18 at 12:37, <paul.durr...@citrix.com> wrote:
>>>>>>>>>>>>> These are probably both candidates for back-port.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Paul Durrant (2):
>>>>>>>>>>>>>   x86/hvm/ioreq: MMIO range checking completely ignores
>>>>>> direction
>>>>>>>> flag
>>>>>>>>>>>>>   x86/hvm/emulate: make sure rep I/O emulation does not cross
>>>>>> GFN
>>>>>>>>>>>>>     boundaries
>>>>>>>>>>>>>
>>>>>>>>>>>>>  xen/arch/x86/hvm/emulate.c | 17 ++++++++++++++++-
>>>>>>>>>>>>>  xen/arch/x86/hvm/ioreq.c   | 15 ++++++++++-----
>>>>>>>>>>>>>  2 files changed, 26 insertions(+), 6 deletions(-)
>>>>>>>>>>>> I take it this isn't yet what we've talked about yesterday on irc?
>>>>>>>>>>>>
>>>>>>>>>>> This is the band-aid fix. I can now show correct handling of a rep
>>>> mov
>>>>>>>>>>> walking off MMIO into RAM.
>>>>>>>>>> But that's not the problem we're having. In our case the bad
>>>> behavior
>>>>>>>>>> is with a single MOV. That's why I had assumed that your plan to
>>>> fiddle
>>>>>>>>>> with null_handler would help in our case as well, while this series
>>>>>> clearly
>>>>>>>>>> won't (afaict).
>>>>>>>>>>
>>>>>>>>> Oh, I see. A single MOV spanning MMIO and RAM has undefined
>>>>>> behaviour
>>>>>>>> though
>>>>>>>>> as I understand it. Am I incorrect?
>>>>>>>> I'm not aware of SDM or PM saying anything like this. Anyway, the
>>>>>>>> specific case where this is being observed as an issue is when
>>>>>>>> accessing the last few bytes of a normal RAM page followed by a
>>>>>>>> ballooned out one. The balloon driver doesn't remove the virtual
>>>>>>>> mapping of such pages (presumably in order to not shatter super
>>>>>>>> pages); observation is with the old XenoLinux one, but from code
>>>>>>>> inspection the upstream one behaves the same.
>>>>>>>>
>>>>>>>> Unless we want to change the balloon driver's behavior, at least
>>>>>>>> this specific case needs to be considered having defined behavior,
>>>>>>>> I think.
>>>>>>>>
>>>>>>> Ok. I'll see what I can do.
>>>>>> It is a software error to try and cross boundaries.  Modern processors
>>>>>> do their best to try and cause the correct behaviour to occur, albeit
>>>>>> with a massive disclaimer about the performance hit.  Older processors
>>>>>> didn't cope.
>>>>>>
>>>>>> As far as I'm concerned, its fine to terminate a emulation which crosses
>>>>>> a boundary with the null ops.
>>>>> Alas we never even get as far as the I/O handlers in some circumstances...
>>>>>
>>>>> I just set up a variant of an XTF test doing a backwards rep movsd into a
>>>>> well aligned stack buffer where source buffer starts 1 byte before a
>>>> boundary
>>>>> between RAM and MMIO. The code in hvmemul_rep_movs() (rightly)
>>>> detects that
>>>>> both the source and dest of the initial rep are RAM, skips over the I/O
>>>>> emulation calls, and then fails when the hvm_copy_from_guest_phys()
>>>>> (unsurprisingly) fails to grab the 8 bytes for the initial rep.
>>>>> So, any logic we add to deal with handling page spanning ops is going to
>>>>> have to go in at the top level of instruction emulation... which I fear is
>>>>> going to be quite a major change and not something that's going to be easy
>>>> to
>>>>> back-port.
>>>> Well, wasn't it clear from the beginning that a proper fix would be too
>>>> invasive to backport? And wasn't it for that reason that you intended
>>>> to add a small hack first, to deal with just the case(s) that we currently
>>>> have issues with?
>>> Well, given that I mistakenly understood you were hitting the same rep 
>>> issue 
>>> that I was, I thought I could deal with it in a reasonably straightforward 
>>> way. Maybe I can still do a point fix for what you are hitting though. What 
>>> precisely are you hitting? Always a single MOV? And always from a page 
>>> spanning source to a well aligned dest? Or more combinations than that?
>> Always a single, misaligned MOV spanning the boundary from a valid
>> RAM page to a ballooned out one (Linux'es load_unaligned_zeropad()).
>> I meanwhile wonder though whether it might not be better to address
>> this at the p2m level, by inserting a r/o mapping of an all-zeros page.
>> George, do you have any opinion one way or the other?
> Sorry, what exactly is the issue here?  Linux has a function called
> load_unaligned_zeropad() which is reading into a ballooned region?
>
> Fundamentally, a ballooned page is one which has been allocated to a
> device driver.  I'm having a hard time coming up with a justification
> for having code which reads memory owned by B in the process of reading
> memory owned by A.  Or is there some weird architectural reason that I'm
> not aware of?

The underlying issue is that the emulator can't cope with a single
misaligned access which crosses RAM and MMIO.  It gives up and
presumably throws #UD back.

One longstanding Xen bug is that simply ballooning a page out shouldn't
be able to trigger MMIO emulation to begin with.  It is a side effect of
mixed p2m types, and the fix for this to have Xen understand the guest
physmap layout.

However, the real bug is Linux making such a misaligned access into a
ballooned out page in the first place.  This is a Linux kernel bug which
(presumably) manifests in a very obvious way due to shortcomings in
Xen's emulation handling.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to