Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-07-04 Thread Jan Beulich
>>> On 03.07.17 at 19:24,  wrote:
> On 03/07/17 17:06, Jan Beulich wrote:
> On 03.07.17 at 17:07,  wrote:
>>> On 22/06/17 10:06, Jan Beulich wrote:
> +*mfn++ = _mfn(page_to_mfn(page));
> +frame++;
> +
> +if ( p2m_is_discard_write(p2mt) )
> +{
> +err = ERR_PTR(~(long)X86EMUL_OKAY);
> +goto out;
 If one page is discard-write and the other isn't, this will end up
 being wrong.
>>> Straddled accesses are always a grey area, and discard-write is an extra
>>> special case which only exists inside Xen.  Discard-write means that the
>>> guest knows that it shouldn't write there at all.
>> Is it the case that the guest knows? Iirc this type had been
>> introduced for introspection tool use.
> 
> This is for Intel GVT-g (c/s 1c02cce0ed).

Oh, I've mixed this up with the hvmemul_write_discard() & Co,
which ...

> Introspection would go very wrong if selective writes had no effect.

... are introspection specific, so I'm afraid you're not entirely right
with this statement.

>> Plus - not having the parameters const means you can't
>> allocate and initialize something in one go, and then store or
>> pass around a pointer to it which is const-qualified to make
>> clear the contents of that memory block aren't supposed to be
>> further modified (until de-allocation).
> 
> I don't see this as a bad thing.  You can't do it with malloc()/free().

And wrongly so, imo.

> +struct hvm_emulate_ctxt *hvmemul_ctxt)
 There upsides and downsides to requiring the caller to pass in the
 same values as to map(): You can do more correctness checking
 here, but you also risk the caller using the wrong values (perhaps
 because of a meanwhile updated local variable). While I don't
 outright object to this approach, personally I'd prefer minimal
 inputs here, and the code deriving everything from hvmemul_ctxt.
>>> I'm not sure exactly how we might wish to extend this logic.  Are we
>>> ever going to want more than one active mapping at once (perhaps rep
>>> movs emulation across two ram regions)?
>> I could imagine even more than two regions - from an abstract
>> perspective it may be possible / helpful to have a mapping of
>> each piece of memory a single instruction accesses, along the
>> lines of minimal number of architecturally guaranteed TLB
>> entries on ia64 in order to execute any possible x86 insn.
> 
> A further problem I've just realised is the use of multiple ->write()
> calls for a single instruction in x86_emulate().  The preceding writes
> obviously can't get backed out in the case of the second write taking a
> fault, which will cause the emulator to exhibit similar
> non-architectural behaviour as this patch is trying to fix.

I'm not convinced hardware guarantees all cases where we use
multiple writes to be carried out in one step. I'm relatively sure
multiple pushes (for far calls) are being carried out one by one.
Just that exception checks likely are being done once up front
(albeit I'm less certain about #PF, whereas for all segment
related things this would seem natural). Same for SGDT/SIDT,
which I think are the only non-push multi-write insns.

And of course the same issue exists for multiple segment register
writes.

>>> The other reason is that in the release builds, even if we stored the
>>> pointer in hvmemul_ctxt, we still cant determine which unmapping
>>> function to use without linear and size.
>> I don't understand - all you pass on is "mapping". And whether to
>> unmap a single or two pages could be told from hvmemul_ctxt->mfn[1]
>> (not) being INVALID_MFN.
> 
> That isn't safe outside of the debug builds.
> 
> You could in principle use mfn[1] != 0 in release builds, if it weren't
> for the fact that hvmemul_ctxt could be used for multiple ->write()
> calls.  In the case of a straddled write followed by a non-straddled
> write, the second unmap() would evaluate incorrectly.

I'd of course imply unmap() to set ->mfn[] to INVALID_MFN.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-07-03 Thread Andrew Cooper
On 03/07/17 17:06, Jan Beulich wrote:
 On 03.07.17 at 17:07,  wrote:
>> On 22/06/17 10:06, Jan Beulich wrote:
 +{
 +ASSERT_UNREACHABLE();
 +goto unhandleable;
 +}
 +
 +do {
 +enum hvm_translation_result res;
 +struct page_info *page;
 +pagefault_info_t pfinfo;
 +p2m_type_t p2mt;
 +
 +res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, 
 pfec,
 + , , NULL, );
 +
 +switch ( res )
 +{
 +case HVMTRANS_okay:
 +break;
 +
 +case HVMTRANS_bad_linear_to_gfn:
 +x86_emul_pagefault(pfinfo.ec, pfinfo.linear, 
 _ctxt->ctxt);
 +err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
 +goto out;
 +
 +case HVMTRANS_bad_gfn_to_mfn:
 +err = NULL;
 +goto out;
 +
 +case HVMTRANS_gfn_paged_out:
 +case HVMTRANS_gfn_shared:
 +err = ERR_PTR(~(long)X86EMUL_RETRY);
 +goto out;
 +
 +default:
 +goto unhandleable;
 +}
 +
 +/* Error checking.  Confirm that the current slot is clean. */
 +ASSERT(mfn_x(*mfn) == 0);
>>> Wouldn't this better be done first thing in the loop?
>> IMO its clearer to keep it next to the assignment, but yes, could in
>> principle move to the top of the loop.
>>
>>> And wouldn't the value better be INVALID_MFN?
>> The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
>> here is more convenient.
>>
>> Furthermore, INVALID_MFN is used lower down to poison unused slots, so
>> initialising the whole array to INVALID_MFN reduces the effectiveness of
>> the checks.
> Well, further down I had implicitly asked whether some of the
> checks wouldn't better go away (as they can't be carried out
> with some of the redundant unmap inputs dropped).
>
 +*mfn++ = _mfn(page_to_mfn(page));
 +frame++;
 +
 +if ( p2m_is_discard_write(p2mt) )
 +{
 +err = ERR_PTR(~(long)X86EMUL_OKAY);
 +goto out;
>>> If one page is discard-write and the other isn't, this will end up
>>> being wrong.
>> Straddled accesses are always a grey area, and discard-write is an extra
>> special case which only exists inside Xen.  Discard-write means that the
>> guest knows that it shouldn't write there at all.
> Is it the case that the guest knows? Iirc this type had been
> introduced for introspection tool use.

This is for Intel GVT-g (c/s 1c02cce0ed).

Introspection would go very wrong if selective writes had no effect.

>
>> Doing nothing (by logically extending the discard-write restriction over
>> the entire region) is the least bad option here, IMO.
> Okay, that's a reasonable argument. I'd suggest saying so
> explicitly in a comment, though.

I will add a comment.

>
 +static void hvmemul_unmap_linear_addr(
 +void *mapping, unsigned long linear, unsigned int bytes,
>>> Both vunmap() and unmap_domain_page() take pointers to const, so
>>> please use const on the pointer here too.
>> The meaning of const void *p in C is "this function does not modify the
>> content pointed to by p".
>>
>> Both vunmap() and unmap_domain_page() mutate the content being pointed
>> to, so should not take const pointers.
> We've had this discussion before, and I continue to take a
> different position: When you free a memory block, you implicitly
> declare its contents undefined.

In C's view, the object explicitly becomes undefined, as its lifetime
has come to an end, but this is a well defined operation.

> An undefined modification to undefined contents still yields undefined.

The precondition to this statement is false.

Before the unmap() call, the object and its contents are well defined.

By using a function with a const pointer, the contract of the function
says the object doesn't change, and therefore, its content doesn't change.

Therefore, the object and its contents are still well defined after the
call.


This final statement is obviously false, because we blew away the
pagetables under the mapping.  The contradiction exists because of the
use of const pointer in the first place.

>  So no actual change.
> Furthermore it is not a given that a freeing routine actually
> touches the handed memory block at all - that's an internal
> implementation detail of the allocator.
>
> Plus - not having the parameters const means you can't
> allocate and initialize something in one go, and then store or
> pass around a pointer to it which is const-qualified to make
> clear the contents of that memory block aren't supposed to be
> further modified (until de-allocation).

I don't see this as a bad thing.  You can't do it with malloc()/free().

>
 +struct hvm_emulate_ctxt 

Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-07-03 Thread Jan Beulich
>>> On 03.07.17 at 17:07,  wrote:
> On 22/06/17 10:06, Jan Beulich wrote:
>>> +{
>>> +ASSERT_UNREACHABLE();
>>> +goto unhandleable;
>>> +}
>>> +
>>> +do {
>>> +enum hvm_translation_result res;
>>> +struct page_info *page;
>>> +pagefault_info_t pfinfo;
>>> +p2m_type_t p2mt;
>>> +
>>> +res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
>>> + , , NULL, );
>>> +
>>> +switch ( res )
>>> +{
>>> +case HVMTRANS_okay:
>>> +break;
>>> +
>>> +case HVMTRANS_bad_linear_to_gfn:
>>> +x86_emul_pagefault(pfinfo.ec, pfinfo.linear, 
>>> _ctxt->ctxt);
>>> +err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
>>> +goto out;
>>> +
>>> +case HVMTRANS_bad_gfn_to_mfn:
>>> +err = NULL;
>>> +goto out;
>>> +
>>> +case HVMTRANS_gfn_paged_out:
>>> +case HVMTRANS_gfn_shared:
>>> +err = ERR_PTR(~(long)X86EMUL_RETRY);
>>> +goto out;
>>> +
>>> +default:
>>> +goto unhandleable;
>>> +}
>>> +
>>> +/* Error checking.  Confirm that the current slot is clean. */
>>> +ASSERT(mfn_x(*mfn) == 0);
>> Wouldn't this better be done first thing in the loop?
> 
> IMO its clearer to keep it next to the assignment, but yes, could in
> principle move to the top of the loop.
> 
>> And wouldn't the value better be INVALID_MFN?
> 
> The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
> here is more convenient.
> 
> Furthermore, INVALID_MFN is used lower down to poison unused slots, so
> initialising the whole array to INVALID_MFN reduces the effectiveness of
> the checks.

Well, further down I had implicitly asked whether some of the
checks wouldn't better go away (as they can't be carried out
with some of the redundant unmap inputs dropped).

>>> +*mfn++ = _mfn(page_to_mfn(page));
>>> +frame++;
>>> +
>>> +if ( p2m_is_discard_write(p2mt) )
>>> +{
>>> +err = ERR_PTR(~(long)X86EMUL_OKAY);
>>> +goto out;
>> If one page is discard-write and the other isn't, this will end up
>> being wrong.
> 
> Straddled accesses are always a grey area, and discard-write is an extra
> special case which only exists inside Xen.  Discard-write means that the
> guest knows that it shouldn't write there at all.

Is it the case that the guest knows? Iirc this type had been
introduced for introspection tool use.

> Doing nothing (by logically extending the discard-write restriction over
> the entire region) is the least bad option here, IMO.

Okay, that's a reasonable argument. I'd suggest saying so
explicitly in a comment, though.

>>> +static void hvmemul_unmap_linear_addr(
>>> +void *mapping, unsigned long linear, unsigned int bytes,
>> Both vunmap() and unmap_domain_page() take pointers to const, so
>> please use const on the pointer here too.
> 
> The meaning of const void *p in C is "this function does not modify the
> content pointed to by p".
> 
> Both vunmap() and unmap_domain_page() mutate the content being pointed
> to, so should not take const pointers.

We've had this discussion before, and I continue to take a
different position: When you free a memory block, you implicitly
declare its contents undefined. An undefined modification to
undefined contents still yields undefined. So no actual change.
Furthermore it is not a given that a freeing routine actually
touches the handed memory block at all - that's an internal
implementation detail of the allocator.

Plus - not having the parameters const means you can't
allocate and initialize something in one go, and then store or
pass around a pointer to it which is const-qualified to make
clear the contents of that memory block aren't supposed to be
further modified (until de-allocation).

>>> +struct hvm_emulate_ctxt *hvmemul_ctxt)
>> There upsides and downsides to requiring the caller to pass in the
>> same values as to map(): You can do more correctness checking
>> here, but you also risk the caller using the wrong values (perhaps
>> because of a meanwhile updated local variable). While I don't
>> outright object to this approach, personally I'd prefer minimal
>> inputs here, and the code deriving everything from hvmemul_ctxt.
> 
> I'm not sure exactly how we might wish to extend this logic.  Are we
> ever going to want more than one active mapping at once (perhaps rep
> movs emulation across two ram regions)?

I could imagine even more than two regions - from an abstract
perspective it may be possible / helpful to have a mapping of
each piece of memory a single instruction accesses, along the
lines of minimal number of architecturally guaranteed TLB
entries on ia64 in order to execute any possible x86 insn.

> The other reason is that in the release builds, even if we stored the
> pointer in hvmemul_ctxt, we still 

Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-07-03 Thread Andrew Cooper
On 22/06/17 10:06, Jan Beulich wrote:
>
>> +/*
>> + * mfn points to the next free slot.  All used slots have a page 
>> reference
>> + * held on them.
>> + */
>> +mfn_t *mfn = _ctxt->mfn[0];
>> +
>> +/*
>> + * The caller has no legitimate reason for trying a zero-byte write, but
>> + * final is calculate to fail safe in release builds.
>> + *
>> + * The maximum write size depends on the number of adjacent mfns[] which
>> + * can be vmap()'d, accouting for possible misalignment within the 
>> region.
>> + * The higher level emulation callers are responsible for ensuring that
>> + * mfns[] is large enough for the requested write size.
>> + */
>> +if ( bytes == 0 ||
>> + final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
> Any reason not to use ">= ARRAY_SIZE(hvmemul_ctxt->mfn)"

It more obviously identifies the accounting for misalignment.

>
>> +{
>> +ASSERT_UNREACHABLE();
>> +goto unhandleable;
>> +}
>> +
>> +do {
>> +enum hvm_translation_result res;
>> +struct page_info *page;
>> +pagefault_info_t pfinfo;
>> +p2m_type_t p2mt;
>> +
>> +res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
>> + , , NULL, );
>> +
>> +switch ( res )
>> +{
>> +case HVMTRANS_okay:
>> +break;
>> +
>> +case HVMTRANS_bad_linear_to_gfn:
>> +x86_emul_pagefault(pfinfo.ec, pfinfo.linear, 
>> _ctxt->ctxt);
>> +err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
>> +goto out;
>> +
>> +case HVMTRANS_bad_gfn_to_mfn:
>> +err = NULL;
>> +goto out;
>> +
>> +case HVMTRANS_gfn_paged_out:
>> +case HVMTRANS_gfn_shared:
>> +err = ERR_PTR(~(long)X86EMUL_RETRY);
>> +goto out;
>> +
>> +default:
>> +goto unhandleable;
>> +}
>> +
>> +/* Error checking.  Confirm that the current slot is clean. */
>> +ASSERT(mfn_x(*mfn) == 0);
> Wouldn't this better be done first thing in the loop?

IMO its clearer to keep it next to the assignment, but yes, could in
principle move to the top of the loop.

> And wouldn't the value better be INVALID_MFN?

The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
here is more convenient.

Furthermore, INVALID_MFN is used lower down to poison unused slots, so
initialising the whole array to INVALID_MFN reduces the effectiveness of
the checks.

>
>> +*mfn++ = _mfn(page_to_mfn(page));
>> +frame++;
>> +
>> +if ( p2m_is_discard_write(p2mt) )
>> +{
>> +err = ERR_PTR(~(long)X86EMUL_OKAY);
>> +goto out;
> If one page is discard-write and the other isn't, this will end up
> being wrong.

Straddled accesses are always a grey area, and discard-write is an extra
special case which only exists inside Xen.  Discard-write means that the
guest knows that it shouldn't write there at all.

Doing nothing (by logically extending the discard-write restriction over
the entire region) is the least bad option here, IMO.

>> +goto unhandleable;
>> +
>> +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
>> +while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
>> +{
>> +ASSERT(mfn_x(*mfn) == 0);
>> +*mfn++ = INVALID_MFN;
>> +}
>> +#endif
>> +
>> +return mapping;
>> +
>> + unhandleable:
>> +err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
>> +
>> + out:
>> +/* Drop all held references. */
>> +while ( mfn > hvmemul_ctxt->mfn )
>> +put_page(mfn_to_page(mfn_x(*mfn--)));
> ITYM
>
> while ( mfn-- > hvmemul_ctxt->mfn )
> put_page(mfn_to_page(mfn_x(*mfn)));
>
> or
>
> while ( mfn > hvmemul_ctxt->mfn )
> put_page(mfn_to_page(mfn_x(*--mfn)));

Yes, I do.

>
>> +static void hvmemul_unmap_linear_addr(
>> +void *mapping, unsigned long linear, unsigned int bytes,
> Both vunmap() and unmap_domain_page() take pointers to const, so
> please use const on the pointer here too.

The meaning of const void *p in C is "this function does not modify the
content pointed to by p".

Both vunmap() and unmap_domain_page() mutate the content being pointed
to, so should not take const pointers.

>
>> +struct hvm_emulate_ctxt *hvmemul_ctxt)
> There upsides and downsides to requiring the caller to pass in the
> same values as to map(): You can do more correctness checking
> here, but you also risk the caller using the wrong values (perhaps
> because of a meanwhile updated local variable). While I don't
> outright object to this approach, personally I'd prefer minimal
> inputs here, and the code deriving everything from hvmemul_ctxt.

I'm not sure exactly how we might wish to extend this logic.  Are we
ever going to want more than one active mapping at once (perhaps rep
movs emulation across two ram regions)?

The other reason is that in 

Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-06-22 Thread Jan Beulich
>>> On 21.06.17 at 17:12,  wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
>  }
>  
>  /*
> + * Map the frame(s) covering an individual linear access, for writeable
> + * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other errors
> + * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
> + *
> + * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
> + * clean before use, and poisions unused slots with INVALID_MFN.
> + */
> +static void *hvmemul_map_linear_addr(
> +unsigned long linear, unsigned int bytes, uint32_t pfec,
> +struct hvm_emulate_ctxt *hvmemul_ctxt)
> +{
> +struct vcpu *curr = current;
> +void *err, *mapping;
> +
> +/* First and final gfns which need mapping. */
> +unsigned long frame = linear >> PAGE_SHIFT, first = frame;
> +unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;

I second Paul's desire for the zero bytes case to be rejected up
the call stack.

> +/*
> + * mfn points to the next free slot.  All used slots have a page 
> reference
> + * held on them.
> + */
> +mfn_t *mfn = _ctxt->mfn[0];
> +
> +/*
> + * The caller has no legitimate reason for trying a zero-byte write, but
> + * final is calculate to fail safe in release builds.
> + *
> + * The maximum write size depends on the number of adjacent mfns[] which
> + * can be vmap()'d, accouting for possible misalignment within the 
> region.
> + * The higher level emulation callers are responsible for ensuring that
> + * mfns[] is large enough for the requested write size.
> + */
> +if ( bytes == 0 ||
> + final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )

Any reason not to use ">= ARRAY_SIZE(hvmemul_ctxt->mfn)"

> +{
> +ASSERT_UNREACHABLE();
> +goto unhandleable;
> +}
> +
> +do {
> +enum hvm_translation_result res;
> +struct page_info *page;
> +pagefault_info_t pfinfo;
> +p2m_type_t p2mt;
> +
> +res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
> + , , NULL, );
> +
> +switch ( res )
> +{
> +case HVMTRANS_okay:
> +break;
> +
> +case HVMTRANS_bad_linear_to_gfn:
> +x86_emul_pagefault(pfinfo.ec, pfinfo.linear, 
> _ctxt->ctxt);
> +err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
> +goto out;
> +
> +case HVMTRANS_bad_gfn_to_mfn:
> +err = NULL;
> +goto out;
> +
> +case HVMTRANS_gfn_paged_out:
> +case HVMTRANS_gfn_shared:
> +err = ERR_PTR(~(long)X86EMUL_RETRY);
> +goto out;
> +
> +default:
> +goto unhandleable;
> +}
> +
> +/* Error checking.  Confirm that the current slot is clean. */
> +ASSERT(mfn_x(*mfn) == 0);

Wouldn't this better be done first thing in the loop? And wouldn't
the value better be INVALID_MFN?

> +*mfn++ = _mfn(page_to_mfn(page));
> +frame++;
> +
> +if ( p2m_is_discard_write(p2mt) )
> +{
> +err = ERR_PTR(~(long)X86EMUL_OKAY);
> +goto out;

If one page is discard-write and the other isn't, this will end up
being wrong.

> +}
> +
> +} while ( frame < final );
> +
> +/* Entire access within a single frame? */
> +if ( first == final )
> +mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear & 
> ~PAGE_MASK);
> +/* Multiple frames? Need to vmap(). */
> +else if ( (mapping = vmap(hvmemul_ctxt->mfn,
> +  mfn - hvmemul_ctxt->mfn)) == NULL )

final - first + 1 would likely yield better code.

> +goto unhandleable;
> +
> +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
> +while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
> +{
> +ASSERT(mfn_x(*mfn) == 0);
> +*mfn++ = INVALID_MFN;
> +}
> +#endif
> +
> +return mapping;
> +
> + unhandleable:
> +err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
> +
> + out:
> +/* Drop all held references. */
> +while ( mfn > hvmemul_ctxt->mfn )
> +put_page(mfn_to_page(mfn_x(*mfn--)));

ITYM

while ( mfn-- > hvmemul_ctxt->mfn )
put_page(mfn_to_page(mfn_x(*mfn)));

or

while ( mfn > hvmemul_ctxt->mfn )
put_page(mfn_to_page(mfn_x(*--mfn)));

> +static void hvmemul_unmap_linear_addr(
> +void *mapping, unsigned long linear, unsigned int bytes,

Both vunmap() and unmap_domain_page() take pointers to const, so
please use const on the pointer here too.

> +struct hvm_emulate_ctxt *hvmemul_ctxt)

There upsides and downsides to requiring the caller to pass in the
same values as to map(): You can do more correctness checking
here, but you also risk the caller using the wrong values (perhaps

Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-06-21 Thread Paul Durrant
> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: 21 June 2017 16:13
> To: Xen-devel 
> Cc: Andrew Cooper ; Jan Beulich
> ; Paul Durrant ; Razvan
> Cojocaru ; Mihai Donțu
> 
> Subject: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real
> mappings
> 
> An access which crosses a page boundary is performed atomically by x86
> hardware, albeit with a severe performance penalty.  An important corner
> case
> is when a straddled access hits two pages which differ in whether a
> translation exists, or in net access rights.
> 
> The use of hvm_copy*() in hvmemul_write() is problematic, because it
> performs
> a translation then completes the partial write, before moving onto the next
> translation.
> 
> If an individual emulated write straddles two pages, the first of which is
> writable, and the second of which is not, the first half of the write will
> complete before #PF is raised from the second half.
> 
> This results in guest state corruption as a side effect of emulation, which
> has been observed to cause windows to crash while under introspection.
> 
> Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
> entire contents of a linear access, and vmap() the underlying frames to
> provide a contiguous virtual mapping for the emulator to use.  This is the
> same mechanism as used by the shadow emulation code.
> 
> This will catch any translation issues and abort the emulation before any
> modifications occur.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Paul Durrant 
> CC: Razvan Cojocaru 
> CC: Mihai Donțu 
> 
> While the maximum size of linear mapping is capped at 1 page, the logic in
> the
> helpers is written to work properly as hvmemul_ctxt->mfn[] gets longer,
> specifically with XSAVE instruction emulation in mind.
> 
> This has only had light testing so far.
> ---
>  xen/arch/x86/hvm/emulate.c| 179
> ++
>  xen/include/asm-x86/hvm/emulate.h |   7 ++
>  2 files changed, 169 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 384ad0b..804bea5 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t
> mmio_gpa,
>  }
> 
>  /*
> + * Map the frame(s) covering an individual linear access, for writeable
> + * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other
> errors
> + * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
> + *
> + * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
> + * clean before use, and poisions unused slots with INVALID_MFN.
> + */
> +static void *hvmemul_map_linear_addr(
> +unsigned long linear, unsigned int bytes, uint32_t pfec,
> +struct hvm_emulate_ctxt *hvmemul_ctxt)
> +{
> +struct vcpu *curr = current;
> +void *err, *mapping;
> +
> +/* First and final gfns which need mapping. */
> +unsigned long frame = linear >> PAGE_SHIFT, first = frame;
> +unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;

Do we need to worry about linear + bytes overflowing here?

Also, is this ever legitimately called with bytes == 0?

> +
> +/*
> + * mfn points to the next free slot.  All used slots have a page 
> reference
> + * held on them.
> + */
> +mfn_t *mfn = _ctxt->mfn[0];
> +
> +/*
> + * The caller has no legitimate reason for trying a zero-byte write, but
> + * final is calculate to fail safe in release builds.
> + *
> + * The maximum write size depends on the number of adjacent mfns[]
> which
> + * can be vmap()'d, accouting for possible misalignment within the 
> region.
> + * The higher level emulation callers are responsible for ensuring that
> + * mfns[] is large enough for the requested write size.
> + */
> +if ( bytes == 0 ||
> + final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
> +{

I guess not, so why the weird looking calculation for final? It's value will 
not be used when bytes == 0.

> +ASSERT_UNREACHABLE();
> +goto unhandleable;
> +}
> +
> +do {
> +enum hvm_translation_result res;
> +struct page_info *page;
> +pagefault_info_t pfinfo;
> +p2m_type_t p2mt;
> +
> +res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
> + , , NULL, );
> +
> +switch ( res )
> +{
> +case HVMTRANS_okay:
> +break;
> +
> +case HVMTRANS_bad_linear_to_gfn:
> +x86_emul_pagefault(pfinfo.ec, pfinfo.linear, 
> _ctxt->ctxt);
> +err = 

[Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

2017-06-21 Thread Andrew Cooper
An access which crosses a page boundary is performed atomically by x86
hardware, albeit with a severe performance penalty.  An important corner case
is when a straddled access hits two pages which differ in whether a
translation exists, or in net access rights.

The use of hvm_copy*() in hvmemul_write() is problematic, because it performs
a translation then completes the partial write, before moving onto the next
translation.

If an individual emulated write straddles two pages, the first of which is
writable, and the second of which is not, the first half of the write will
complete before #PF is raised from the second half.

This results in guest state corruption as a side effect of emulation, which
has been observed to cause windows to crash while under introspection.

Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
entire contents of a linear access, and vmap() the underlying frames to
provide a contiguous virtual mapping for the emulator to use.  This is the
same mechanism as used by the shadow emulation code.

This will catch any translation issues and abort the emulation before any
modifications occur.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Paul Durrant 
CC: Razvan Cojocaru 
CC: Mihai Donțu 

While the maximum size of linear mapping is capped at 1 page, the logic in the
helpers is written to work properly as hvmemul_ctxt->mfn[] gets longer,
specifically with XSAVE instruction emulation in mind.

This has only had light testing so far.
---
 xen/arch/x86/hvm/emulate.c| 179 ++
 xen/include/asm-x86/hvm/emulate.h |   7 ++
 2 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 384ad0b..804bea5 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
 }
 
 /*
+ * Map the frame(s) covering an individual linear access, for writeable
+ * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other errors
+ * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
+ *
+ * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
+ * clean before use, and poisions unused slots with INVALID_MFN.
+ */
+static void *hvmemul_map_linear_addr(
+unsigned long linear, unsigned int bytes, uint32_t pfec,
+struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+struct vcpu *curr = current;
+void *err, *mapping;
+
+/* First and final gfns which need mapping. */
+unsigned long frame = linear >> PAGE_SHIFT, first = frame;
+unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;
+
+/*
+ * mfn points to the next free slot.  All used slots have a page reference
+ * held on them.
+ */
+mfn_t *mfn = _ctxt->mfn[0];
+
+/*
+ * The caller has no legitimate reason for trying a zero-byte write, but
+ * final is calculate to fail safe in release builds.
+ *
+ * The maximum write size depends on the number of adjacent mfns[] which
+ * can be vmap()'d, accouting for possible misalignment within the region.
+ * The higher level emulation callers are responsible for ensuring that
+ * mfns[] is large enough for the requested write size.
+ */
+if ( bytes == 0 ||
+ final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
+{
+ASSERT_UNREACHABLE();
+goto unhandleable;
+}
+
+do {
+enum hvm_translation_result res;
+struct page_info *page;
+pagefault_info_t pfinfo;
+p2m_type_t p2mt;
+
+res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
+ , , NULL, );
+
+switch ( res )
+{
+case HVMTRANS_okay:
+break;
+
+case HVMTRANS_bad_linear_to_gfn:
+x86_emul_pagefault(pfinfo.ec, pfinfo.linear, _ctxt->ctxt);
+err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
+goto out;
+
+case HVMTRANS_bad_gfn_to_mfn:
+err = NULL;
+goto out;
+
+case HVMTRANS_gfn_paged_out:
+case HVMTRANS_gfn_shared:
+err = ERR_PTR(~(long)X86EMUL_RETRY);
+goto out;
+
+default:
+goto unhandleable;
+}
+
+/* Error checking.  Confirm that the current slot is clean. */
+ASSERT(mfn_x(*mfn) == 0);
+
+*mfn++ = _mfn(page_to_mfn(page));
+frame++;
+
+if ( p2m_is_discard_write(p2mt) )
+{
+err = ERR_PTR(~(long)X86EMUL_OKAY);
+goto out;
+}
+
+} while ( frame < final );
+
+/* Entire access within a single frame? */
+if ( first == final )
+mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear & 
~PAGE_MASK);
+/* Multiple frames? Need to vmap(). */
+