from:"Tim Deegan"

Re: [Xen-devel] [PATCH v3 for-next 4/4] xen: Convert __page_to_mfn and __mfn_to_page to use typesafe MFN

2017-11-02 Thread Tim Deegan

At 14:03 + on 01 Nov (1509544996), Julien Grall wrote:
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn because the
> rest of the function use mfn_t.
> 
> So make __page_to_mfn and __mfn_to_page return mfn_t by default.
> 
> Only reasonable clean-ups are done in this patch because it is
> already quite big. So some of the files now override page_to_mfn and
> mfn_to_page to avoid using mfn_t.
> 
> Lastly, domain_page_to_mfn is also converted to use mfn_t given that
> most of the callers are now switched to _mfn(domain_page_to_mfn(...)).
> 
> Signed-off-by: Julien Grall 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] mm/shadow: fix declaration of fetch_type_names

2017-10-17 Thread Tim Deegan

At 11:23 +0100 on 17 Oct (1508239433), Roger Pau Monne wrote:
> fetch_type_names usage is guarded by SHADOW_DEBUG_PROPAGATE in
> SHADOW_DEBUG, fix the declaration so it's also guarded by
> SHADOW_DEBUG_PROPAGATE instead of DEBUG_TRACE_DUMP.
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for 4.10 0/3] XSA-243 followup

2017-10-14 Thread Tim Deegan

At 14:54 +0100 on 12 Oct (1507820059), Andrew Cooper wrote:
> The important change here is in patch 3, which is intended to remove the
> correct-but-dangerous-looking construction of linear pagetables slots for
> shadowed guests.

> Andrew Cooper (3):
>   Revert "x86/mm: move PV l4 table setup code" and "x86/mm: factor out
> pv_arch_init_memory"
>   x86/mm: Consolidate all Xen L2 slot writing into
> init_xen_pae_l2_slots()
>   x86/mm: Consolidate all Xen L4 slot writing into init_xen_l4_slots()

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/7] xen: Convert __page_to_mfn and __mfn_to_page to use typesafe MFN

2017-10-06 Thread Tim Deegan

At 19:15 +0100 on 04 Oct (1507144519), Julien Grall wrote:
> Hi all,
> 
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn becaue the rest
> of the function use mfn_t.
> 
> So I think it is time to make __page_to_mfn and __mfn_to_page using typesafe
> MFN.
> 
> The first 6 patches will convert of the code to use typesafe MFN, easing
> the tree-wide conversion in patch 7.

x86 shadow code changes Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86: Make use of pagetable_get_mfn() where appropriate

2017-10-06 Thread Tim Deegan

At 19:36 +0100 on 28 Sep (1506627400), Andrew Cooper wrote:
> ... instead of the opencoded _mfn(pagetable_get_pfn(...)) construct.
> 
> Fix two overly long lines; no functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h

2017-09-23 Thread Tim Deegan

At 07:52 -0600 on 22 Sep (1506066733), Jan Beulich wrote:
> >>> On 14.09.17 at 14:58,  wrote:
> > The l1 mask needs to stay in x86/mm.c while l{2,3,4} masks are only
> > needed by PV code. Both x86 common mm code and PV mm code use
> > base_disallow_mask and l1 maks.
> > 
> > Export base_disallow_mask and l1 mask in asm-x86/mm.h.
> 
> So that's because in patch 20 you need to keep
> get_page_from_l1e() in x86/mm.c, due to being used by shadow
> code. But is shadow using the same disallow mask for HVM guests
> actually correct? Perhaps it would be better for callers of
> get_page_from_l1e() to pass in their disallow masks, even if it
> would just so happen that PV and shadow use the same ones?
> Tim, do you have any thoughts or insights here?

IIRC the shadow code isn't relying on the disallow mask for anything
in HVM guests; it's built the shadows according to its own rules.

I don't think that anything's improved by making the disallow mask
explicit but I have no objection to it if it makes other refactoring
simpler.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 0/4] Various XSA followups

2017-09-08 Thread Tim Deegan

At 19:05 +0300 on 08 Sep (1504897532), Alexandru Isaila wrote:
> XSA-219 was discovered while trying to implement the bugfix in patch 4.
> 
> Andrew Cooper (4):
>  x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
>  [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result
>  x86/hvm: Break out __hvm_copy()'s translation logic
>  x86/hvm: Implement hvmemul_write() using real mappings
> 
> Alexandru Isaila (2):
>  x86/hvm: Break out __hvm_copy()'s translation logic
>  x86/hvm: Implement hvmemul_write() using real mappings


For the shadow pagetable changes in patches 1 and 2:
Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Consolidate all Xen L4 slot writing into init_xen_l4_slots()

2017-09-03 Thread Tim Deegan

At 18:07 +0100 on 01 Sep (1504289261), Andrew Cooper wrote:
>  if ( unlikely(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS) )
>  {
> -l4_pgentry_t *next = &l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT +
> -root_pgt_pv_xen_slots];
> +/*
> + * If using highmem-start=, artificially shorten the directmap to
> + * simulate very large machines.
> + */
> +l4_pgentry_t *next;
> +
> +memcpy(&l4t[l4_table_offset(XEN_VIRT_START)],
> +   &idle_pg_table[l4_table_offset(XEN_VIRT_START)],
> +   (ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots -
> +l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
> +
> +next = &l4t[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots];
>  
>  if ( l4e_get_intpte(split_l4e) )
>  *next++ = split_l4e;
>  
>  memset(next, 0,
> -   _p(&l4tab[ROOT_PAGETABLE_LAST_XEN_SLOT + 1]) - _p(next));
> +   _p(&l4t[ROOT_PAGETABLE_LAST_XEN_SLOT + 1]) - _p(next));
>  }
> -#else
> -BUILD_BUG_ON(root_pgt_pv_xen_slots != ROOT_PAGETABLE_PV_XEN_SLOTS);
> +else
>  #endif
> -l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
> -l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR_RW);
> -l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
> -l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
> -if ( zap_ro_mpt || is_pv_32bit_domain(d) )
> -l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
> +{
> +/*
> + * For PV guests, provide the shortened directmap, up to 
> PV_XEN_SLOTS.
> + * For HVM guests and the idle vcpus, provide the extended directmap.
> + */
> +unsigned int slots = ((d && !paging_mode_external(d))
> +  ? ROOT_PAGETABLE_PV_XEN_SLOTS
> +  : ROOT_PAGETABLE_XEN_SLOTS);
> +
> +memcpy(&l4t[l4_table_offset(XEN_VIRT_START)],
> +   &idle_pg_table[l4_table_offset(XEN_VIRT_START)],
> +   (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
> +    l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));

Does this branch need a memset(0) for the PV case, to zap the higher
directmap slots?  

The shadow changes all look fine, so:
Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: Clarify the safety of guest-linear mapping construction

2017-09-03 Thread Tim Deegan

At 16:29 +0100 on 01 Sep (1504283350), Andrew Cooper wrote:
> sh_install_xen_entries_in_l4() looks unsafe, as it creates a guest-linear
> mapping with gl4mfn.  However, it is correct because of the way monitor tables
> are constructed for translated domains.
> 
> Leave a comment and some clarifying assertions.
> 
> Also, there is no longer support for translate != external, so drop the clause
> as it is dead.
> 
> Finally, correct the comment for sh_install_xen_entries_in_l2h().  We need to
> add Xen mappings into l2h for 3-on-any PV guests.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/mm: Use mfn_t for make_cr3()

2017-08-30 Thread Tim Deegan

At 13:19 +0100 on 30 Aug (1504099173), Andrew Cooper wrote:
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86/pv: Switch {fill, zap}_ro_mpt() to using mfn_t

2017-08-29 Thread Tim Deegan

At 12:19 +0100 on 29 Aug (1504009152), Andrew Cooper wrote:
> And update all affected callers.  Fix the fill_ro_mpt() prototype to be bool
> like its implementation.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/mm: Introduce and use l?e_{get, from}_mfn()

2017-08-24 Thread Tim Deegan

At 14:14 +0100 on 24 Aug (1503584097), Andrew Cooper wrote:
> This avoids the explicit boxing/unboxing of mfn_t in relevant codepaths.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 2/2] x86/mm: PG_translate implies PG_refcounts

2017-08-24 Thread Tim Deegan

At 11:05 +0100 on 24 Aug (1503572736), Wei Liu wrote:
> On Thu, Aug 24, 2017 at 11:02:33AM +0100, Tim Deegan wrote:
> > At 16:58 +0100 on 23 Aug (1503507504), Wei Liu wrote:
> > > After 404595352 ("x86/paging: Enforce PG_external == PG_translate ==
> > > PG_refcounts"), PG_refcounts and PG_translate always need to be set
> > > together.
> > > 
> > > Squash PG_refcounts to simplify code. All calls paging_mode_refcounts
> > > are replaced by calls to paging_mode_translate.
> > 
> > Would it be a good idea to merge all three and have only PG_external?
> > 
> 
> My reverse-engineering is that when PV guest is migrated it has
> PG_external and (the new) PG_translate.
> 
> I would be happy to squash all three into one if you tell me I'm wrong.
> :-)

I _think_ you're wrong :) but can't check right now as I'm
semi-offline.  Migrating PV guests should have paging enabled but
not any of external, translate or refcounts.

There was once code that used PG_translate with PV guests but never
AFAIK checked in to upstream Xen.  There was some talk of using that
combination for PVH at one point but IIRC PVH now uses
translate|refcounts|external like HVM.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 1/2] xen: remove CONFIG_PAGING_ASSISTANCE

2017-08-24 Thread Tim Deegan

At 10:04 -0600 on 23 Aug (1503482672), Jan Beulich wrote:
> >>> On 23.08.17 at 17:58,  wrote:
> > Arm should always set it, while on x86 xen can't build with it set to
> > 0, which means people haven't used it for years.
> > 
> > Remove it and simplify xen/paging.h.
> > 
> > Signed-off-by: Wei Liu 
> 
> This is something I certainly would want Tim to see (and perhaps
> approve).

Thanks.  I see and approve. :)

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 2/2] x86/mm: PG_translate implies PG_refcounts

2017-08-24 Thread Tim Deegan

At 16:58 +0100 on 23 Aug (1503507504), Wei Liu wrote:
> After 404595352 ("x86/paging: Enforce PG_external == PG_translate ==
> PG_refcounts"), PG_refcounts and PG_translate always need to be set
> together.
> 
> Squash PG_refcounts to simplify code. All calls paging_mode_refcounts
> are replaced by calls to paging_mode_translate.

Would it be a good idea to merge all three and have only PG_external?

> Signed-off-by: Wei Liu 

Acked-by: Tim Deegan 

with one adjustment:

>  /* common paging mode bits */
>  #define PG_mode_shift  10 
> -/* Refcounts based on shadow tables instead of guest tables */
> -#define PG_refcounts   (XEN_DOMCTL_SHADOW_ENABLE_REFCOUNT << PG_mode_shift)
>  /* Enable log dirty mode */
>  #define PG_log_dirty   (XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY << PG_mode_shift)
>  /* Xen does p2m translation, not guest */

Please add "and handles refcounts" to the comment describing PG_translate.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit}

2017-08-19 Thread Tim Deegan

At 20:04 +0200 on 18 Aug (1503086640), Dario Faggioli wrote:
> Hey,
> 
> I know, v2 of this series just went out. But since I leave for 2 weeks, I
> wanted the version with as much as possible comments addressed to be on the
> list, and since it was rather quick to do that (and test it), for latest Tim's
> comment, here I am.
> 
> So, basically, this is v2, as here:
> 
>  https://lists.xen.org/archives/html/xen-devel/2017-08/msg01881.html
> 
> But with, as Tim suggested, the idle_cpumask initialized to "all clear". The
> various CPUs, therefore, are considered busy when they come up, and clear 
> their
> own bit in the mask, as soon as they enter the idle loop for the first time
> (which is pretty soon). Doing like this, we close another window for a
> potential (although, rather unlikely) race/unnecessary extension of the grace
> period, and we simplify the code (a.k.a. 'win-win' :-D).

Thanks!

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-18 Thread Tim Deegan

At 09:04 -0600 on 18 Aug (1503047077), Jan Beulich wrote:
> >>> On 18.08.17 at 16:47,  wrote:
> > At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> >> >>> On 16.08.17 at 18:47,  wrote:
> >> > On 16/08/17 16:23, Jan Beulich wrote:
> >> > On 16.08.17 at 13:22,  wrote:
> >> >>> --- a/xen/arch/x86/mm/shadow/multi.c
> >> >>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >> >>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >> >>>   * will make sure no inconsistent mapping being translated into
> >> >>>   * shadow page table. */
> >> >>>  version = 
> >> >>> atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> >> >>> -rmb();
> >> >>>  walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> >> >> Isn't this supposed to make sure version is being read first? I.e.
> >> >> doesn't this at least need to be barrier()?
> >> > 
> >> > atomic_read() is not free to be reordered by the compiler.  It is an asm
> >> > volatile with a volatile memory reference.
> >> 
> >> Oh, right - I did forget about the volatiles there (since generally,
> >> like in Linux, we appear to try to avoid volatile).
> > 
> > FWIW, I don't think that's quite right.  The GCC docs I have say that
> > "volatile" will stop the compiler from omitting an asm altogether, or
> > hoisting it out of a loop (on the assumption that it will always
> > produce the same output for the same inputs).  And that "the compiler
> > can move even volatile 'asm' instructions relative to other code,
> > including across jump instructions."
> 
> Oh, I had talked about the volatile qualifiers, no the volatile asm-s.

I'm not sure what other volatile you mean here, but accesses to
volatile objects are only ordered WRT other _volatile_ accesses.
So, e.g.: https://godbolt.org/g/L2qa8h

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-18 Thread Tim Deegan

At 15:47 +0100 on 18 Aug (1503071247), Tim Deegan wrote:
> At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> > >>> On 16.08.17 at 18:47,  wrote:
> > > atomic_read() is not free to be reordered by the compiler.  It is an asm
> > > volatile with a volatile memory reference.
> > 
> > Oh, right - I did forget about the volatiles there (since generally,
> > like in Linux, we appear to try to avoid volatile).
> 
> FWIW, I don't think that's quite right.  The GCC docs I have say that
> "volatile" will stop the compiler from omitting an asm altogether, or
> hoisting it out of a loop (on the assumption that it will always
> produce the same output for the same inputs).  And that "the compiler
> can move even volatile 'asm' instructions relative to other code,
> including across jump instructions."

...and indeed: https://godbolt.org/g/KW19QR

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-18 Thread Tim Deegan

At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> >>> On 16.08.17 at 18:47,  wrote:
> > On 16/08/17 16:23, Jan Beulich wrote:
> > On 16.08.17 at 13:22,  wrote:
> >>> --- a/xen/arch/x86/mm/shadow/multi.c
> >>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >>>   * will make sure no inconsistent mapping being translated into
> >>>   * shadow page table. */
> >>>  version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> >>> -rmb();
> >>>  walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> >> Isn't this supposed to make sure version is being read first? I.e.
> >> doesn't this at least need to be barrier()?
> > 
> > atomic_read() is not free to be reordered by the compiler.  It is an asm
> > volatile with a volatile memory reference.
> 
> Oh, right - I did forget about the volatiles there (since generally,
> like in Linux, we appear to try to avoid volatile).

FWIW, I don't think that's quite right.  The GCC docs I have say that
"volatile" will stop the compiler from omitting an asm altogether, or
hoisting it out of a loop (on the assumption that it will always
produce the same output for the same inputs).  And that "the compiler
can move even volatile 'asm' instructions relative to other code,
including across jump instructions."

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version.

2017-08-18 Thread Tim Deegan

Use the smp_ variants, as we're only synchronizing against other CPUs.

Add a write barrier before incrementing the version.

x86's memory ordering rules and the presence of various out-of-unit
function calls mean that this code worked OK before, and the barriers
are mostly decorative.

Signed-off-by: Tim Deegan 
---
 xen/arch/x86/mm/shadow/multi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index c9c2252..f8a8928 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -206,6 +206,7 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t 
*gw, int version)
 
 ASSERT(paging_locked_by_me(d));
 
+/* No need for smp_rmb() here; taking the paging lock was enough. */
 if ( version == atomic_read(&d->arch.paging.shadow.gtable_dirty_version) )
  return 1;
 
@@ -3112,7 +3113,7 @@ static int sh_page_fault(struct vcpu *v,
  * will make sure no inconsistent mapping being translated into
  * shadow page table. */
 version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
-rmb();
+smp_rmb();
 walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3188,6 +3189,7 @@ static int sh_page_fault(struct vcpu *v,
  * overlapping with this one may be inconsistent
  */
 perfc_incr(shadow_rm_write_flush_tlb);
+smp_wmb();
 atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
 flush_tlb_mask(d->domain_dirty_cpumask);
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-18 Thread Tim Deegan

At 14:55 +0100 on 18 Aug (1503068128), Tim Deegan wrote:
> At 12:22 +0100 on 16 Aug (1502886128), Andrew Cooper wrote:
> > diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> > index c9c2252..1e3dfaf 100644
> > --- a/xen/arch/x86/mm/shadow/multi.c
> > +++ b/xen/arch/x86/mm/shadow/multi.c
> > @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >   * will make sure no inconsistent mapping being translated into
> >   * shadow page table. */
> >  version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> > -rmb();
> >  walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> 
> Nack.  We must read the version before reading the tables, or we might
> accidentally use out-of-date tables.
> 
> If anything, this needs more barriers!  There ought to be a read
> barrier before we re-read the version in shadow_check_gwalk(), but
> taking the paging lock DTRT.  And there ought to be a wmb() before we
> increment the version later on, which I guess I'll make a patch for.

These can be smp_*mb(), though, to align with the rest of the series.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-18 Thread Tim Deegan

At 12:22 +0100 on 16 Aug (1502886128), Andrew Cooper wrote:
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index c9c2252..1e3dfaf 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>   * will make sure no inconsistent mapping being translated into
>   * shadow page table. */
>  version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> -rmb();
>  walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);

Nack.  We must read the version before reading the tables, or we might
accidentally use out-of-date tables.

If anything, this needs more barriers!  There ought to be a read
barrier before we re-read the version in shadow_check_gwalk(), but
taking the paging lock DTRT.  And there ought to be a wmb() before we
increment the version later on, which I guess I'll make a patch for.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal

2017-08-18 Thread Tim Deegan

At 12:22 +0100 on 16 Aug (1502886127), Andrew Cooper wrote:
>   * Drop trailing whitespace.
>   * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing 
> return.
>   * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
> compiler barriers anyway, and each wrmsr() is already fully serialising.

But wrmsr() is not a compiler barrier!  So if the write-barriers are
needed (e.g. for the update to the global 'adjust') then you can't
remove them just because WRMSR is a CPU barrier.

If they're not needed (which is plausible) then the commit message
should explain that instead.

Nit: I think tinkering with memory barriers deserves its own commit,
not to be the third item in a list of 'minor cleanup's.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-17 Thread Tim Deegan

Hi,

This looks good to me.  I have one question:

At 18:45 +0200 on 16 Aug (1502909149), Dario Faggioli wrote:
> @@ -474,7 +484,41 @@ static struct notifier_block cpu_nfb = {
>  void __init rcu_init(void)
>  {
>  void *cpu = (void *)(long)smp_processor_id();
> +
> +cpumask_setall(&rcu_ctrlblk.idle_cpumask);
> +/* The CPU we're running on is certainly not idle */
> +cpumask_clear_cpu(smp_processor_id(), &rcu_ctrlblk.idle_cpumask);
>  cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
>  register_cpu_notifier(&cpu_nfb);
>  open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
>  }
> +
> +/*
> + * The CPU is becoming idle, so no more read side critical
> + * sections, and one more step toward grace period.
> + */
> +void rcu_idle_enter(unsigned int cpu)
> +{
> +/*
> + * During non-boot CPU bringup and resume, until this function is
> + * called for the first time, it's fine to find our bit already set.
> + */
> +ASSERT(!cpumask_test_cpu(cpu, &rcu_ctrlblk.idle_cpumask) ||
> +   (system_state < SYS_STATE_active || system_state >= 
> SYS_STATE_resume));

Does every newly started CPU immediately idle?  If not, then it might
run in an RCU read section but excluded from the grace period
mechanism.

It seems like it would be better to start with the idle_cpumask empty,
and rely on online_cpumask to exclude CPUs that aren't running.
Or if that doesn't work, to call rcu_idle_exit/enter on the CPU
bringup/shutdown paths and simplify this assertion.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-14 Thread Tim Deegan

At 18:21 +0200 on 14 Aug (1502734919), Dario Faggioli wrote:
> On Mon, 2017-08-14 at 14:54 +0100, Tim Deegan wrote:
> > At 15:24 +0200 on 14 Aug (1502724279), Dario Faggioli wrote:
> > > About the former... I'm not sure which check of rcp->cur you're
> > > referring to. I think it's the one in rcu_check_quiescent_state(),
> > > but
> > > then, I'm not sure where to actually put the barrier...
> > 
> > I mean whatever one causes the CPU to DTRT about the new grace
> > period.
> > AFAICT that's the one in __rcu_pending().  The important thing is
> > that
> > that read mustn't be allowed to happen before the write to the
> > idle_cpumask.
> >
> Interestingly enough, someone seems to have had a very similar
> discussion before:
> 
> https://lkml.org/lkml/2005/12/8/155
> 
> >   I'd be inclined to put the barrier right after the
> > cpumask_set_cpu() in rcu_idle_enter().
> > 
> And with conclusions similar to these: :-)
> 
> https://lkml.org/lkml/2005/12/8/308
> 
> I've no idea why they then didn't put an actual barrier in place. This
> thread mention s390's ordering, as at the time, this mechanism was only
> in used there, but they've not added one when making things general.
> 
> IAC, I'm fine putting down one. :-)

Sounds good to me. :)  I think it's required (even on s390!) but of
course there may be other barriers on those paths that happen to DTRT.
E.g. cpumask_set_cpu() itself is implicitly a mb() on x86, (but not,
AFAICT, on ARM).

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-14 Thread Tim Deegan

Hi,

At 15:24 +0200 on 14 Aug (1502724279), Dario Faggioli wrote:
> On Mon, 2017-08-14 at 11:39 +0100, Tim Deegan wrote:
> > At 11:19 +0200 on 14 Aug (1502709563), Dario Faggioli wrote:
> > > 3) CPU2 is not in idle_cpumask, and so it will not be in the
> > > sampled
> > > mask, but the first check of rcu_pending() will lead acknowledge
> > > quiescence, and calling of cpu_quiet();
> > 
> > Yep.  And because it's not yet in idle_cpumask, it _will_ check
> > rcu_pending() before idling.  I think that needs an smp_mb() between
> > setting the idle_cpumask and checking rcp->cur, and likewise between
> > rcp->cur++ and cpumask_andnot() in rcu_start_batch().
> > 
> So, the latter, I'm putting it there already, by importing Linux's
> c3f59023, "Fix RCU race in access of nohz_cpu_mask" (in this ver
> patch).

Yep, looks correct to me.

> About the former... I'm not sure which check of rcp->cur you're
> referring to. I think it's the one in rcu_check_quiescent_state(), but
> then, I'm not sure where to actually put the barrier...

I mean whatever one causes the CPU to DTRT about the new grace period.
AFAICT that's the one in __rcu_pending().  The important thing is that
that read mustn't be allowed to happen before the write to the
idle_cpumask.  I'd be inclined to put the barrier right after the
cpumask_set_cpu() in rcu_idle_enter().

> I'll keep looking, but any advice is welcome. Even after all these
> years, barriers still gives me headache. :-P

Me too! :)

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-14 Thread Tim Deegan

Hi,

At 11:19 +0200 on 14 Aug (1502709563), Dario Faggioli wrote:
> Basically, for the race to occur (in [3]), it is necessary that [2]
> (CPU1 doing rcp->cur++) happens after [1] (last call to rcu_pending()
> of CPU2, before clearing the mask and going idle). In fact, it that is
> not the case, rcu_pending() in CPU2 will say 'no', because of this
> check:
> 
>  if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)
> 
> which would prevent the CPU from going idle (and will, at least, lead
> to it going through check_quiescent_state() twice, clearing itself from
> the new grace period too).
> 
> Therefore, it looks to me that the race can be avoided by making sure
> that there is at least one check of rcu_pending(), after a CPU has
> called rcu_enter_idle(). This basically means moving rcu_idle_enter()
> up.

Sounds plausible.

> I was actually thinking to move it inside the tick-stopping logic too.
> This way, either, when CPU1 samples the cpumask for the new grace
> period:
> 1) CPU2 will be in idle_cpumask already, and it actually goes idle;

The system works!

> 2) CPU2 will be in idle_cpumask, but then it does not go idle
> (cpu_is_haltable() returning false) for various reasons. This may look
> unideal, but if it is here, trying to go idle, it is not holding
> references to read-side critical sections, or at least we can consider
> it to be quiescing, so it's ok to ignore it for the grace period;

Yes.  This is equivalent to CPU2 actually going idle and then waking
up quickly.  As you say, since at this point the RCU logic thinks the
CPU can idle, whatever stopped it actually idling can't have been
relevant to RCU.

> 3) CPU2 is not in idle_cpumask, and so it will not be in the sampled
> mask, but the first check of rcu_pending() will lead acknowledge
> quiescence, and calling of cpu_quiet();

Yep.  And because it's not yet in idle_cpumask, it _will_ check
rcu_pending() before idling.  I think that needs an smp_mb() between
setting the idle_cpumask and checking rcp->cur, and likewise between
rcp->cur++ and cpumask_andnot() in rcu_start_batch().

Sounds good!

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-09 Thread Tim Deegan

Hi,

At 10:01 +0200 on 27 Jul (1501149684), Dario Faggioli wrote:
> In Xen, that is impossible, and that's particularly problematic
> when system is idle (or lightly loaded) systems, as CPUs that
> are idle may never have the chance to tell RCU about their
> quiescence, and grace periods could extend indefinitely!
[...]
> The first step for fixing this situation is for RCU to record,
> at the beginning of a grace period, which CPUs are already idle.
> In fact, being idle, they can't be in the middle of any read-side
> critical section, and we don't have to wait for them to declare
> a grace period finished.

AIUI this patch fixes a bug where:
 - a CPU is idle/asleep;
 - it is added to the cpumask of a new RCU grace period; and
 - because the CPU is asleep, the grace period never ends. 
Have I understood?

I think we might be left with a race condition:
 - CPU A is about to idle.  It runs the RCU softirq and
   clears itself from the current grace period.
 - CPU B ends the grace period and starts a new one.
 - CPU A calls rcu_idle_enter() and sleeps.
 - The new grace period never ends.

Is that fixed by your later rcu_idle_timer patch?  AIUI that's only
invoked when the calling CPU has pending RCU callbacks.

Or can it be fixed here by something like this in rcu_idle_enter?
 - lock rcp->lock
 - add ourselves to the idle mask
 - if we are in rcp->cpumask:
 - cpu_quiet()
 - unlock rcp->lock

There's also the code at the top of rcu_check_quiescent_state() that
requres _two_ idle states per batch.  I don't know what race that's
protecting against so I don't know whether we need to worry about it
here as well. :)

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()

2017-07-25 Thread Tim Deegan

At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote:
> The purpose of guest_walk_tables() is to match the behaviour of real hardware.
> 
> A PV guest can have 2M superpages in its pagetables, via the M2P and the
> initial initrd mapping, even if it isn't permitted to create arbitrary 2M
> superpage mappings.

Can the domain builder (or Xen?) really give a guest superpage
mappings for its initrd?  Wouldn't that cause problems for live
migration?

In any case this patch looks correct: the presence of superpages in PV
pagetables is decided by the PV MM rules, so the walker should accept them.

Reviewed-by: Tim Deegan 

> guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV
> guest policy enforcement, rather than its intended purpose of meaning "would
> hardware tolerate finding an L2 superpage with these control settings?"
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Tim Deegan 
> CC: George Dunlap 
> CC: Wei Liu 
> ---
>  xen/include/asm-x86/guest_pt.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
> index 72126d5..08031c8 100644
> --- a/xen/include/asm-x86/guest_pt.h
> +++ b/xen/include/asm-x86/guest_pt.h
> @@ -205,15 +205,17 @@ static inline guest_l4e_t guest_l4e_from_gfn(gfn_t gfn, 
> u32 flags)
>  static inline bool guest_can_use_l2_superpages(const struct vcpu *v)
>  {
>  /*
> + * PV guests use Xen's paging settings.  Being 4-level, 2M
> + * superpages are unconditionally supported.
> + *
>   * The L2 _PAGE_PSE bit must be honoured in HVM guests, whenever
>   * CR4.PSE is set or the guest is in PAE or long mode.
>   * It's also used in the dummy PT for vcpus with CR0.PG cleared.
>   */
> -return (is_pv_vcpu(v)
> -? opt_allow_superpage
> -: (GUEST_PAGING_LEVELS != 2
> -   || !hvm_paging_enabled(v)
> -   || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
> +return (is_pv_vcpu(v) ||
> +GUEST_PAGING_LEVELS != 2 ||
> +!hvm_paging_enabled(v) ||
> +(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE));
>  }
>  
>  static inline bool guest_can_use_l3_superpages(const struct domain *d)
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] x86/shadow: Switch to using bool

2017-06-30 Thread Tim Deegan

At 16:40 +0100 on 30 Jun (1498840853), Andrew Cooper wrote:
>  * sh_pin() has boolean properties, so switch its return type.
>  * sh_remove_shadows() uses ints everywhere other than its stub.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/shadow: Switch to using bool

2017-06-28 Thread Tim Deegan

Hi,

At 12:16 +0100 on 28 Jun (1498652182), Andrew Cooper wrote:
> sh_pin() has boolean properties, so switch its return type.
> 
> Signed-off-by: Andrew Cooper 

Good idea, thanks.

> -static bool_t
> +static bool
>  sh_write_guest_entry(struct vcpu *v, intpte_t *p, intpte_t new, mfn_t gmfn)

Can you please update paging.h too?  We need matching changes to
write_guest_entry and cmpxchg_guest_entry in struct shadow_paging_mode,
and invlpg in struct paging_mode.

I'm a little surprised that the compiler doesn't complain.  I suppose
the implicit promotion to int makes it all equivalent.

> @@ -3620,7 +3620,7 @@ static int sh_page_fault(struct vcpu *v,
>   * instruction should be issued on the hardware, or 0 if it's safe not
>   * to do so.
>   */
> -static bool_t sh_invlpg(struct vcpu *v, unsigned long va)
> +static bool sh_invlpg(struct vcpu *v, unsigned long va)

This comment needs to be updated too. 

> @@ -102,7 +102,7 @@ int shadow_set_allocation(struct domain *d, unsigned int 
> pages,
>  ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
>  
>  static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
> - bool_t fast, bool_t all) {}
> + bool fast, bool all) {}

Actually, please make these ints, to match the main implementation.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce {G, M}FN_INVALID_INITIALIZER

2017-06-27 Thread Tim Deegan

At 10:33 +0100 on 27 Jun (1498559600), Julien Grall wrote:
> The current implementation of {G,M}FN_INVALID cannot be used to
> initialize global variable because the initializer element is not a
> constant.
> 
> Due to a bug in GCC 4.9 and older ([1]), it is not easy to find a common
> value to initialize a variable and directly passed as an argument.
> 
> Introduce 2 news define {G,M}FN_INVALID_INITIALIZER to be used for
> initializing a variable.
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64856
> 
> Signed-off-by: Julien Grall 

Acked-by: Tim Deegan  (and Ack for the revert too)
but please choose either { ~0UL } or {~0UL} and use it for both.

> ---
> Build tested it with:
> * ARM: GCC 4.9.4, 5.1, 4.3, 6.1.1, 7.1.0
> * x86: Clang 3.5.0, 3.6.0, 3.6.2, 3.8.0, 3.9.0, 4.0.0
> 
> With introducing a dummy global variable common/mm.c:
> 
> mfn_t foo = INVALID_MFN_INITIALIZER
> ---
>  xen/include/xen/mm.h | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> index 0050fba498..251db4ffa1 100644
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -57,6 +57,11 @@
>  TYPE_SAFE(unsigned long, mfn);
>  #define PRI_mfn  "05lx"
>  #define INVALID_MFN  _mfn(~0UL)
> +/*
> + * To be used for global variable initialization. This workaround a bug
> + * in GCC < 5.0.
> + */
> +#define INVALID_MFN_INITIALIZER {~0UL}
>  
>  #ifndef mfn_t
>  #define mfn_t /* Grep fodder: mfn_t, _mfn() and mfn_x() are defined above */
> @@ -90,6 +95,11 @@ static inline bool_t mfn_eq(mfn_t x, mfn_t y)
>  TYPE_SAFE(unsigned long, gfn);
>  #define PRI_gfn  "05lx"
>  #define INVALID_GFN  _gfn(~0UL)
> +/*
> + * To be used for global variable initialization. This workaround a bug
> + * in GCC < 5.0 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64856
> + */
> +#define INVALID_GFN_INITIALIZER { ~0UL }
>  
>  #ifndef gfn_t
>  #define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above */
> -- 
> 2.11.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-23 Thread Tim Deegan

At 03:18 -0600 on 23 Jun (1498187924), Jan Beulich wrote:
> >>> On 23.06.17 at 10:55,  wrote:
> 
> > 
> > On 23/06/17 09:30, Jan Beulich wrote:
> > On 22.06.17 at 20:31,  wrote:
> >>> Hi,
> >>>
> >>> On 20/06/17 11:32, Jan Beulich wrote:
> >>> On 20.06.17 at 12:06,  wrote:
> > At 03:36 -0600 on 20 Jun (1497929778), Jan Beulich wrote:
> > On 20.06.17 at 11:14,  wrote:
> >>> At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote:
> >>> On 19.06.17 at 18:57,  wrote:
> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -56,7 +56,7 @@
> >
> >  TYPE_SAFE(unsigned long, mfn);
> >  #define PRI_mfn  "05lx"
> > -#define INVALID_MFN  _mfn(~0UL)
> > +#define INVALID_MFN  (mfn_t){ ~0UL }
> 
>  While I don't expect anyone to wish to use a suffix expression on
>  this constant, for maximum compatibility this should still be fully
>  parenthesized, I think. Of course this should be easy enough to
>  do while committing.
> 
>  Are you able to assure us that clang supports this gcc extension
>  (compound literal for non-compound types)
> >>>
> >>> AIUI this is a C99 feature, not a GCCism.
> >>
> >> Most parts of it yes (it is a gcc extension in C89 mode only), but the
> >> specific use here isn't afaict: Compound literals outside of functions
> >> are static objects, and hence couldn't be used as initializers of other
> >> objects.
> >
> > Ah, I see.  So would it be better to use
> >
> >   #define INVALID_MFN ((const mfn_t) { ~0UL })
> >
> > ?
> 
>  While I think we should indeed consider adding the const, the above
>  still is a static object, and hence still not suitable as an initializer 
>  as
>  per C99 or C11. But as long as gcc and clang permit it, we're fine.
> >>>
> >>> Actually this solutions breaks on GCC 4.9 provided by Linaro ([1]
> >>> 4.9-2016-02 and 4.9-2017.01).
> >>>
> >>> This small reproducer does not compile with -std=gnu99 (used by Xen) but
> >>> compile with this option. Jan, have you tried 4.9 with this patch?
> >>
> >> That's sort of an odd question - you've sent the patch, so I would
> >> have expected you to have made sure it doesn't break (and
> >> while it was me to add the const, this was discussed, and you don't
> >> make clear whether that's the issue). In any event, I've tried ...
> >>
> >>> typedef struct
> >>> {
> >>>  unsigned long i;
> >>> } mfn_t;
> >>>
> >>> mfn_t v = (const mfn_t){~0UL};
> >>
> >> ... this now with 7.1.0, 6.3.0, 5.4.0, 5.2.0, and 4.9.3, and all
> >> of them compile this without errors or warnings (at -Wall -W).
> > 
> > Actually did you build with -std=gnu99? I just tried 4.9.3 for x86 and
> > also 4.8 for ARM64 on Ubuntu Trusty. Both are broken.
> 
> Ah, indeed - that fails with 4.9.3 but succeeds with 5.2.0. And
> it's not the const getting in the way here. I notice this difference
> in their documentation (4.9.3 first, then 7.1.0):
> 
> Compound literals for scalar types and union types are also allowed,
> but then the compound literal is equivalent to a cast.
> 
> Compound literals for scalar types and union types are also allowed.
> In the following example the variable i is initialized to the value 2,
> the result of incrementing the unnamed object created by the
> compound literal.
> 
>   int i = ++(int) { 1 };
> 
> It is especially this example clarifying that newer compilers don't
> treat this like a cast anymore (albeit a casted expression alone is
> fine as initializer in 4.9.3, so there must be more to the failure).
> 
> While I still view this as a compiler bug (as it accepts the code in
> default mode), as a workaround  I guess we'll need to accept a
> gcc < 5 conditional in the header, which we would really have
> wanted to avoid.

Since we'll have to make some scheme that works for 4.9, I think we
should just use that for all versions.

How about:
 - keep INVALID_MFN as an inline function call for most uses;
 - #define INVALID_MFN_INITIALIZER { ~0UL } for when we need a
   real constant initializer aat file scope.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-23 Thread Tim Deegan

At 09:58 +0100 on 23 Jun (1498211910), Tim Deegan wrote:
> At 09:41 +0100 on 23 Jun (1498210893), Julien Grall wrote:
> > On 23/06/17 09:30, Jan Beulich wrote:
> > >
> > >> typedef struct
> > >> {
> > >>  unsigned long i;
> > >> } mfn_t;
> > >>
> > >> mfn_t v = (const mfn_t){~0UL};
> > >
> > > ... this now with 7.1.0, 6.3.0, 5.4.0, 5.2.0, and 4.9.3, and all
> > > of them compile this without errors or warnings (at -Wall -W).
> > > For 4.9.3 I've also specifically taken care to try not only the
> > > x86 compiler, but also the arm32 and arm64 ones. So I'm afraid
> > > I lack enough detail to understand what the issue is and what a
> > > solution may look like.
> > 
> > I don't have much except the following error:
> > 
> > /tmp/test.c:6:1: error: initializer element is not constant
> >   mfn_t v = (const mfn_t){~0UL};
> >   ^
> > 
> > If it works for you on 4.9, then it might be a bug in the GCC provided 
> > by Linaro and will report it.
> 
> This fails for me on x86 gcc 4.9.4, using -xc -std=gnu99.  The
> complaint is valid, as Jan pointed out: the literal is a static object
> and so not a valid initializer.  GCC also complains about the
> 'debug' version for the same reason. :(
> 
> Using a plain initializer works OK for both debug and non-debug:
> 
>   mfn_t v = {~0UL};
> 
> but I haven't checked whether other compilers like that as well.

And that wouldn't work for things like f(INVALID_MFN) -- sometimes we
actually do want the literal.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-23 Thread Tim Deegan

At 09:41 +0100 on 23 Jun (1498210893), Julien Grall wrote:
> Hi Jan,
> 
> On 23/06/17 09:30, Jan Beulich wrote:
>  On 22.06.17 at 20:31,  wrote:
> >> Hi,
> >>
> >> On 20/06/17 11:32, Jan Beulich wrote:
> >> On 20.06.17 at 12:06,  wrote:
>  At 03:36 -0600 on 20 Jun (1497929778), Jan Beulich wrote:
>  On 20.06.17 at 11:14,  wrote:
> >> At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote:
> >> On 19.06.17 at 18:57,  wrote:
>  --- a/xen/include/xen/mm.h
>  +++ b/xen/include/xen/mm.h
>  @@ -56,7 +56,7 @@
> 
>   TYPE_SAFE(unsigned long, mfn);
>   #define PRI_mfn  "05lx"
>  -#define INVALID_MFN  _mfn(~0UL)
>  +#define INVALID_MFN  (mfn_t){ ~0UL }
> >>>
> >>> While I don't expect anyone to wish to use a suffix expression on
> >>> this constant, for maximum compatibility this should still be fully
> >>> parenthesized, I think. Of course this should be easy enough to
> >>> do while committing.
> >>>
> >>> Are you able to assure us that clang supports this gcc extension
> >>> (compound literal for non-compound types)
> >>
> >> AIUI this is a C99 feature, not a GCCism.
> >
> > Most parts of it yes (it is a gcc extension in C89 mode only), but the
> > specific use here isn't afaict: Compound literals outside of functions
> > are static objects, and hence couldn't be used as initializers of other
> > objects.
> 
>  Ah, I see.  So would it be better to use
> 
>    #define INVALID_MFN ((const mfn_t) { ~0UL })
> 
>  ?
> >>>
> >>> While I think we should indeed consider adding the const, the above
> >>> still is a static object, and hence still not suitable as an initializer 
> >>> as
> >>> per C99 or C11. But as long as gcc and clang permit it, we're fine.
> >>
> >> Actually this solutions breaks on GCC 4.9 provided by Linaro ([1]
> >> 4.9-2016-02 and 4.9-2017.01).
> >>
> >> This small reproducer does not compile with -std=gnu99 (used by Xen) but
> >> compile with this option. Jan, have you tried 4.9 with this patch?
> >
> > That's sort of an odd question - you've sent the patch, so I would
> > have expected you to have made sure it doesn't break (and
> > while it was me to add the const, this was discussed, and you don't
> > make clear whether that's the issue). In any event, I've tried ...
> 
> I don't personally try every single compiler every time I am writing a 
> patch... This is too complex given that different stakeholders (Linaro, 
> Debian, Ubuntu,...) provide various binaries with their own patches on top.
> 
> I asked you because I was wondering what is happening on x86 (I don't 
> have 4.9 x86 in hand) and to rule out a bug in the compiler provided by 
> Linaro.
> 
> >
> >> typedef struct
> >> {
> >>  unsigned long i;
> >> } mfn_t;
> >>
> >> mfn_t v = (const mfn_t){~0UL};
> >
> > ... this now with 7.1.0, 6.3.0, 5.4.0, 5.2.0, and 4.9.3, and all
> > of them compile this without errors or warnings (at -Wall -W).
> > For 4.9.3 I've also specifically taken care to try not only the
> > x86 compiler, but also the arm32 and arm64 ones. So I'm afraid
> > I lack enough detail to understand what the issue is and what a
> > solution may look like.
> 
> I don't have much except the following error:
> 
> /tmp/test.c:6:1: error: initializer element is not constant
>   mfn_t v = (const mfn_t){~0UL};
>   ^
> 
> If it works for you on 4.9, then it might be a bug in the GCC provided 
> by Linaro and will report it.

This fails for me on x86 gcc 4.9.4, using -xc -std=gnu99.  The
complaint is valid, as Jan pointed out: the literal is a static object
and so not a valid initializer.  GCC also complains about the
'debug' version for the same reason. :(

Using a plain initializer works OK for both debug and non-debug:

  mfn_t v = {~0UL};

but I haven't checked whether other compilers like that as well.

Tim.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()

2017-06-22 Thread Tim Deegan

Hi,

At 16:12 +0100 on 21 Jun (1498061549), Andrew Cooper wrote:
> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
> infrasturcture, but take the opportunity to avoid opencoding it.

s/sturct/struct/.

> @@ -4752,8 +4752,8 @@ sh_x86_emulate_write(struct vcpu *v, unsigned long 
> vaddr, void *src,
>  return X86EMUL_UNHANDLEABLE;
>  
>  addr = sh_emulate_map_dest(v, vaddr, bytes, sh_ctxt);
> -if ( sh_emulate_map_dest_failed(addr) )
> -return (long)addr;
> +if ( IS_ERR(addr) )
> +return ~PTR_ERR(addr);

Using "return ~PTR_ERR(addr)" when the usual idiom is
"return -PTR_ERR(foo)" is a bit subtle.  Still, the code seems to be
correct, so if people prefer it,

Acked-by: Tim Deegan 

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch()

2017-06-22 Thread Tim Deegan

Hi,

At 16:12 +0100 on 21 Jun (1498061548), Andrew Cooper wrote:
> Zero-legnth reads are jump-target segmentation checks; never serve them from
> the cache.

Why not?  If the target is in the cached range, then it has passed the
segmentation check.  (Or if that's not true then the normal fetch path
needs to be fixed too).

> Force insn_off to a single byte, as offset can wrap around or truncate with
> respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.

Wouldn't it be better to detect that and fall through?  Otherwise we
might return cached bytes by mistake.

Tim.

> Signed-off-by: Andrew Cooper 
> ---
> CC: Tim Deegan 
> CC: Jan Beulich 
> ---
>  xen/arch/x86/mm/shadow/common.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index 2e64a77..deea03a 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -235,12 +235,16 @@ hvm_emulate_insn_fetch(enum x86_segment seg,
>  {
>  struct sh_emulate_ctxt *sh_ctxt =
>  container_of(ctxt, struct sh_emulate_ctxt, ctxt);
> -unsigned int insn_off = offset - sh_ctxt->insn_buf_eip;
> +/* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
> +uint8_t insn_off = offset - sh_ctxt->insn_buf_eip;
>  
>  ASSERT(seg == x86_seg_cs);
>  
> -/* Fall back if requested bytes are not in the prefetch cache. */
> -if ( unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
> +/*
> + * Fall back if requested bytes are not in the prefetch cache, but always
> + * perform the zero-length read for segmentation purposes.
> + */
> +if ( !bytes || unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
>  return hvm_read(seg, offset, p_data, bytes,
>  hvm_access_insn_fetch, sh_ctxt);
>  
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-20 Thread Tim Deegan

At 03:36 -0600 on 20 Jun (1497929778), Jan Beulich wrote:
> >>> On 20.06.17 at 11:14,  wrote:
> > At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote:
> >> >>> On 19.06.17 at 18:57,  wrote:
> >> > --- a/xen/include/xen/mm.h
> >> > +++ b/xen/include/xen/mm.h
> >> > @@ -56,7 +56,7 @@
> >> >  
> >> >  TYPE_SAFE(unsigned long, mfn);
> >> >  #define PRI_mfn  "05lx"
> >> > -#define INVALID_MFN  _mfn(~0UL)
> >> > +#define INVALID_MFN  (mfn_t){ ~0UL }
> >> 
> >> While I don't expect anyone to wish to use a suffix expression on
> >> this constant, for maximum compatibility this should still be fully
> >> parenthesized, I think. Of course this should be easy enough to
> >> do while committing.
> >> 
> >> Are you able to assure us that clang supports this gcc extension
> >> (compound literal for non-compound types)
> > 
> > AIUI this is a C99 feature, not a GCCism.
> 
> Most parts of it yes (it is a gcc extension in C89 mode only), but the
> specific use here isn't afaict: Compound literals outside of functions
> are static objects, and hence couldn't be used as initializers of other
> objects.

Ah, I see.  So would it be better to use

  #define INVALID_MFN ((const mfn_t) { ~0UL })

?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-20 Thread Tim Deegan

At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote:
> >>> On 19.06.17 at 18:57,  wrote:
> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -56,7 +56,7 @@
> >  
> >  TYPE_SAFE(unsigned long, mfn);
> >  #define PRI_mfn  "05lx"
> > -#define INVALID_MFN  _mfn(~0UL)
> > +#define INVALID_MFN  (mfn_t){ ~0UL }
> 
> While I don't expect anyone to wish to use a suffix expression on
> this constant, for maximum compatibility this should still be fully
> parenthesized, I think. Of course this should be easy enough to
> do while committing.
> 
> Are you able to assure us that clang supports this gcc extension
> (compound literal for non-compound types)

AIUI this is a C99 feature, not a GCCism.  Clang supports it as far
back as 3.0: https://godbolt.org/g/YY97uj

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/24] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-14 Thread Tim Deegan

At 17:13 +0100 on 13 Jun (1497373980), Julien Grall wrote:
> INVALID_{G,M}FN are defined using static inline helpers _{g,m}fn.
> This means, they cannot be used to initialize a build time static variable:
> 
> In file included from mm.c:24:0:
> xen/xen/include/xen/mm.h:59:26: error: initializer element is not constant
>  #define INVALID_MFN  _mfn(~0UL)
> 
> Signed-off-by: Julien Grall 

Acked-by: Tim Deegan 

> I know that this solution will not work for non-debug build. I would
> like input from the community on way to fix it nicely.

It seems to WFM: https://godbolt.org/g/vEVNY3

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 for-4.9] x86/pagewalk: Fix determination of Protection Key access rights

2017-05-18 Thread Tim Deegan

Hi,

At 15:02 +0100 on 18 May (1495119734), Andrew Cooper wrote:
>  * When fabricating gl1e's from superpages, propagate the protection key as
>well, so the protection key logic sees the real key as opposed to 0.
> 
>  * Experimentally, the protection key checks are performed ahead of the other
>access rights.  In particular, accesses which fail both protection key and
>regular permission checks yield PFEC_prot_key in the resulting pagefault.
> 
>  * Protection keys apply to all user mode data accesses, including accesses
>from supervisor code.

I think this would be clearer as "all data accesses to user-mode addresses".

>  PKRU WD applies to any data write, not just to
>mapping which are writable.  However, a supervisor access without CR0.WP
>bypasses any protection from protection keys.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.9] x86/pagewalk: Fix determination of Protection Key access rights

2017-05-16 Thread Tim Deegan

At 17:51 +0100 on 16 May (1494957116), Andrew Cooper wrote:
> c/s 4c5d78a10 was accidentally buggy when handling Protection Keys.
> Protection keys applies to all user translations, not just accesses which
> originate from user mode.

Reviewed-by: Tim Deegan 

Does the test for write-protection just below have the opposite bug?
It seems to check whether the page is writable, when AFAICS it should
be checking whether the action is a write (modulo CR0.WP).

Tim.

> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Tim Deegan 
> CC: George Dunlap 
> CC: Julien Grall 
> 
> This regression was introducing during the 4.9 timeframe, so really should be
> fixed before 4.9 ships.
> ---
>  xen/arch/x86/mm/guest_walk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
> index 32d818e..ba72432 100644
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -408,7 +408,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>   * N.B. In the case that the walk ended with a superpage, the fabricated
>   * gw->l1e contains the appropriate leaf pkey.
>   */
> -if ( (walk & PFEC_user_mode) && !(walk & PFEC_insn_fetch) &&
> +if ( (ar & _PAGE_USER) && !(walk & PFEC_insn_fetch) &&
>   guest_pku_enabled(v) )
>  {
>  unsigned int pkey = guest_l1e_get_pkey(gw->l1e);
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()

2017-05-10 Thread Tim Deegan

At 11:42 +0100 on 10 May (1494416566), Andrew Cooper wrote:
> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
> infrasturcture, but take the opportunity to avoid opencoding it.
> 
> The chosen error constants require negating to work with IS_ERR(), but no
> other changes.
> 
> Signed-off-by: Andrew Cooper 

s/sturct/struct/

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Security support scope (apropos of Xen and CNA)

2017-05-05 Thread Tim Deegan

Hi,

At 09:59 +0100 on 05 May (1493978382), Lars Kurth wrote:
> > On 5 May 2017, at 09:43, Tim Deegan  wrote:
> > - Only features listed as Supported in MAINTAINERS get support.
> 
> This seems related to George's proposal of the scope. I am not sure 
> MAINTAINERS is correct though (e.g. live-patching is probably listed as 
> Supported but does not get security support)
>

Ah, so it is.  So there is information on the wiki page that's not in
MAINTAINERS.  Could that be moved into MAINTAINERS?  There are
a few things that don't map well to maintainership of specific
files, e.g. "vMCE" or nested virtualization.  But on the whole I
think that adding clauses for them would be OK.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Security support scope (apropos of Xen and CNA)

2017-05-05 Thread Tim Deegan

At 13:53 +0100 on 04 May (1493905990), Ian Jackson wrote:
> To become a CNA (CVE Numbering Authority), which we would like to do,
> we need to provide MITRE's CNA programme with a definition of the
> scope of our CNA.  That should be the scope of our general security
> support, clearly.
> 
> At the moment we don't seem to have this written down in a single
> clear document.  I am aware of the following places which can contain
> information about security support (normally, in the form of
> statements saying that certain things are not supported):
> 
>  * https://wiki.xenproject.org/wiki/Xen_Project_Release_Features has a
>table of versions with security support, and information about some
>features.
> 
>  * xen.git:docs/misc/qemu-xen-security, limits security support to
>some configurations.
> 
>  * xen.git:MAINTAINERS might in principle have a status not implying
>security support.
> 
>  * Docs for an individual feature (eg in xl docs) might say that the
>feature is not advised, or not supported, or something.
> 
>  * Previous XSA advisories might withdraw support.
> 
> This diversity of information sources is rather unsatisfactory.
> 
> I think we need to at least reduce the number of different information
> sources.  Also we need an overview document which points to them all.
> 
> Where should this overview document be ?  Which of the above sources
> should be coalesced into which others ?

IMO the overview should on the main xenproject.org site, ideally in
the security process preamble, or beside it if it gets too long.

It should read something like this:

 - Security support is provided for the following versions:
   [List of versions, + an item on the release checklist to update it.]

 - Only features listed as Supported in MAINTAINERS get support.

 - Specific exemptions:
   [ move qemu-xen-security here, and delete it from the tree ]
   [ brief summary of XSA-77 + a link for details. ] 
   [ anything else?  I don't think we need to explicitly call out to
 docs for individual features, but there might be some things
 to mention here, e.g. DMA attacks with IOMMU disabled. ]

Not sure about the Xen_Project_Release_Features wiki page -- it's nice
to have all that info + historical versions in one place; on the
other hand it's not the canonical source for most of it and risks
getting out of date.  Maybe it needs an introduction pointing out
that MAINTAINERS and the new security scope doc are the official sources.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/7] x86/mm: Further restrict permissions on some virtual mappings

2017-05-03 Thread Tim Deegan

Hi,

At 19:05 +0100 on 02 May (1493751922), Andrew Cooper wrote:
> As originally reported, the Linear Pagetable slot maps 512GB of ram as RWX,
> where the guest has full read access and a lot of direct or indirect control
> over the written content.  It isn't hard for a PV guest to hide shellcode
> here.
> 
> Therefore, increase defence in depth by auditing our current pagetable
> mappings.
> 
>  * The regular linear, shadow linear, and per-domain slots have no business
>being executable (but need to be written), so are updated to be NX.
>  * The Read Only mappings of the M2P (compat and regular) don't need to be
>writeable or executable.
>  * The PV GDT mappings don't need to be executable.
> 
> Reported-by: Jann Horn 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V2] xen/hvm: fix hypervisor crash with hvm_save_one()

2017-05-03 Thread Tim Deegan

At 10:15 +0100 on 03 May (1493806508), Tim Deegan wrote:
> At 00:31 -0600 on 03 May (1493771502), Jan Beulich wrote:
> > +else if ( ctxt.cur > sizeof(*desc) )
> >  {
> >  uint32_t off;
> > -const struct hvm_save_descriptor *desc;
> >  
> > -rv = -ENOENT;
> >  for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += 
> > desc->length )

It occurs to me that as well as underflowing, this test is off by one.
It ought to be "off + sizeof(*desc) <= ctxt.cur" to allow for a
zero-length record.  AFAIK we don't actually have any of those, so
it's academic, but we might want to represent the presence of some
feature without having any feature-specific state to save.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V2] xen/hvm: fix hypervisor crash with hvm_save_one()

2017-05-03 Thread Tim Deegan

At 00:31 -0600 on 03 May (1493771502), Jan Beulich wrote:
> Hmm, with both of you being of that opinion, I've taken another
> look. I think I see now why you think that way (this being data
> from an internal producer, overflow/underflow are not a primary
> concern), so I'll withdraw my objection to the original patch (i.e.
> I agree taking it with the v2 description). However, an alternative
> would be
> 
> --- unstable.orig/xen/common/hvm/save.c
> +++ unstable/xen/common/hvm/save.c
> @@ -79,14 +79,15 @@ size_t hvm_save_size(struct domain *d)
>  int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, 
>   XEN_GUEST_HANDLE_64(uint8) handle)
>  {
> -int rv = 0;
> +int rv = -ENOENT;
>  size_t sz = 0;
>  struct vcpu *v;
>  hvm_domain_context_t ctxt = { 0, };
> +const struct hvm_save_descriptor *desc;
>  
>  if ( d->is_dying 
>   || typecode > HVM_SAVE_CODE_MAX 
> - || hvm_sr_handlers[typecode].size < sizeof(struct 
> hvm_save_descriptor)
> + || hvm_sr_handlers[typecode].size < sizeof(*desc)
>   || hvm_sr_handlers[typecode].save == NULL )
>  return -EINVAL;
>  
> @@ -107,12 +108,10 @@ int hvm_save_one(struct domain *d, uint1
> d->domain_id, typecode);
>  rv = -EFAULT;
>  }
> -else
> +else if ( ctxt.cur > sizeof(*desc) )
>  {
>  uint32_t off;
> -const struct hvm_save_descriptor *desc;
>  
> -rv = -ENOENT;
>  for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length 
> )
>  {
>  desc = (void *)(ctxt.data + off);
> @@ -122,7 +121,8 @@ int hvm_save_one(struct domain *d, uint1
>  {
>  uint32_t copy_length = desc->length;
>  
> -if ( off + copy_length > ctxt.cur )
> +if ( ctxt.cur < copy_length ||
> + off > ctxt.cur - copy_length )
>  copy_length = ctxt.cur - off;
>  rv = 0;
>  if ( copy_to_guest(handle, ctxt.data + off, copy_length) )
> 
> taking care of overflow/underflow (now consistently) as well, plus
> avoiding the (imo ugly) goto without making the code harder to
> read. Thoughts?

Any of these three patches is fine by me.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V2] xen/hvm: fix hypervisor crash with hvm_save_one()

2017-05-02 Thread Tim Deegan

At 18:21 +0300 on 02 May (1493749307), Razvan Cojocaru wrote:
> hvm_save_cpu_ctxt() returns success without writing any data into
> hvm_domain_context_t when all VCPUs are offline. This can then crash
> the hypervisor (with FATAL PAGE FAULT) in hvm_save_one() via the
> "off < (ctxt.cur - sizeof(*desc))" for() test, where ctxt.cur remains 0,
> causing an underflow which leads the hypervisor to go off the end of the
> ctxt buffer.
[...]
> Reported-by: Razvan Cojocaru 
> Signed-off-by: Andrew Cooper 
> Signed-off-by: Razvan Cojocaru 
> Tested-by: Razvan Cojocaru 

I actually preferred the first patch :P but this is fine.

Acked-by: Tim Deegan 

> diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c
> index 78706f5..3bdd124 100644
> --- a/xen/common/hvm/save.c
> +++ b/xen/common/hvm/save.c
> @@ -113,6 +113,9 @@ int hvm_save_one(struct domain *d, uint16_t typecode, 
> uint16_t instance,
>  const struct hvm_save_descriptor *desc;
>  
>  rv = -ENOENT;
> +if ( ctxt.cur < sizeof(*desc) )
> +goto out;
> +
>  for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length 
> )
>  {
>  desc = (void *)(ctxt.data + off);
> @@ -132,6 +135,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, 
> uint16_t instance,
>  }
>  }
>  
> + out:
>  xfree(ctxt.data);
>  return rv;
>  }
> -- 
> 1.9.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/hvm: fix hypervisor crash with hvm_save_one()

2017-05-02 Thread Tim Deegan

Hi,

At 16:25 +0300 on 02 May (1493742339), Razvan Cojocaru wrote:
> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
> can lead to ctxt.cur being 0. This can then crash the hypervisor
> (with FATAL PAGE FAULT) in hvm_save_one() via the
> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
> in practice with a Linux VM queried around shutdown:

Good fix, thanks!  But I think that memset is innocent -- it's
clearing a local "struct hvm_hw_cpu ctxt", not the caller's
"hvm_domain_context_t ctxt".  AFAICS the underflow happens when the
per-type handler returns no data at all (because all VCPUs are
offline).

With the commit message fixed,

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-05-02 Thread Tim Deegan

At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
> >>> On 02.05.17 at 10:32,  wrote:
> > At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
> >> >>> On 27.04.17 at 11:51,  wrote:
> >> > At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> >> >> ... it wouldn't better be the other way around: We use the patch
> >> >> in its current (or even v1) form, and try to do something about
> >> >> performance only if we really find a case where it matters. To be
> >> >> honest, I'm not even sure how I could meaningfully measure the
> >> >> impact here: Simply counting how many extra flushes there would
> >> >> end up being wouldn't seem all that useful, and whether there
> >> >> would be any measurable difference in the overall execution time
> >> >> of e.g. domain creation I would highly doubt (but if it's that what
> >> >> you're after, I could certainly collect a few numbers).
> >> > 
> >> > I think that would be a good idea, just as a sanity-check.
> >> 
> >> As it turns out there is a measurable effect: xc_dom_boot_image()
> >> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
> >> responsible for less than 10% of the overall time libxl__build_dom()
> >> takes, and that in turn is only a pretty small portion of the overall
> >> "xl create".
> > 
> > Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
> > avoid it, but could be persuaded, and it's not me doing the work. :)
> 
> Well, if there was a way to avoid it in a clean way without too much
> code churn, I'd be all for avoiding it. The avenues we've explored so
> far either didn't work (using pg_owner's dirty mask) or didn't promise
> to actually reduce the flush overhead in a meaningful way (adding a
> separate mask to be merged into the mask used for the flush in
> __get_page_type()), unless - as has been the case before - I didn't
> fully understand your thoughts there.

Quoting your earlier response:

> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
> and consume/clear them in __get_page_type()? Right now I can't
> see it being necessary for correctness to fiddle with any of the
> other flushes using the domain dirty mask.
> 
> But then again this may not be much of a win, unless the put
> operations come through in meaningful batches, not interleaved
> by any type changes (the latter ought to be guaranteed during
> domain construction and teardown at least, as the guest itself
> can't do anything at that time to effect type changes).

I'm not sure how much batching there needs to be.  I agree that the
domain creation case should work well though.  Let me think about the
scenarios when dom B is live:

1. Dom A drops its foreign map of page X; dom B immediately changes the
type of page X.  This case isn't helped at all, but I don't see any
way to improve it -- dom A's TLBs need to be flushed right away.

2. Dom A drops its foreign map of page X; dom B immediately changes
the type of page Y.  Now dom A's dirty CPUs are in the new map, but B
may not need to flush them right away.  B can filter by page Y's
timestamp, and flush (and clear) only some of the cpus in the map.

So that seems good, but then there's a risk that cpus never get
cleared from the map, and __get_page_type() ends up doing a lot of
unnecessary work filtering timestaps.  When is it safe to remove a CPU
from that map?
 - obvs safe if we IPI it to flush the TLB (though may need memory
   barriers -- need to think about a race with CPU C putting A _into_
   the map at the same time...)
 - we could track the timestamp of the most recent addition to the
   map, and drop any CPU whose TLB has been flushed since that,
   but that still lets unrelated unmaps keep CPUs alive in the map...
 - we could double-buffer the map: always add CPUs to the active map;
   from time to time, swap maps and flush everything in the non-active
   map (filtered by the TLB timestamp when we last swapped over).

Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
being (relatively) simple and correct, and revisit this if it causes
trouble. :)

Thanks,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-05-02 Thread Tim Deegan

At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
> >>> On 27.04.17 at 11:51,  wrote:
> > At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> >> ... it wouldn't better be the other way around: We use the patch
> >> in its current (or even v1) form, and try to do something about
> >> performance only if we really find a case where it matters. To be
> >> honest, I'm not even sure how I could meaningfully measure the
> >> impact here: Simply counting how many extra flushes there would
> >> end up being wouldn't seem all that useful, and whether there
> >> would be any measurable difference in the overall execution time
> >> of e.g. domain creation I would highly doubt (but if it's that what
> >> you're after, I could certainly collect a few numbers).
> > 
> > I think that would be a good idea, just as a sanity-check.
> 
> As it turns out there is a measurable effect: xc_dom_boot_image()
> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
> responsible for less than 10% of the overall time libxl__build_dom()
> takes, and that in turn is only a pretty small portion of the overall
> "xl create".

Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
avoid it, but could be persuaded, and it's not me doing the work. :)
Andrew, what do you think?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-04-27 Thread Tim Deegan

At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> ... it wouldn't better be the other way around: We use the patch
> in its current (or even v1) form, and try to do something about
> performance only if we really find a case where it matters. To be
> honest, I'm not even sure how I could meaningfully measure the
> impact here: Simply counting how many extra flushes there would
> end up being wouldn't seem all that useful, and whether there
> would be any measurable difference in the overall execution time
> of e.g. domain creation I would highly doubt (but if it's that what
> you're after, I could certainly collect a few numbers).

I think that would be a good idea, just as a sanity-check.  But apart
from that the patch looks correct to me, so:

Reviewed-by: Tim Deegan 

for v2 (not v1).

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-04-26 Thread Tim Deegan

At 08:07 -0600 on 26 Apr (1493194043), Jan Beulich wrote:
> >>> On 25.04.17 at 12:59,  wrote:
> > Hi,
> > 
> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> >> Jann's explanation of the problem:
> >> 
> >> "start situation:
> >>  - domain A and domain B are PV domains
> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
> >>are not scheduled away
> >>  - domain A has XSM_TARGET access to domain B
> >>  - page X is owned by domain B and has no mappings
> >>  - page X is zeroed
> >> 
> >>  steps:
> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
> >>  - domain A accesses page X through the new PTE, creating a TLB entry
> >>  - domain A removes its mapping of page X
> >>- type count of page X goes to 0
> >>- tlbflush_timestamp of page X is bumped
> >>  - domain B maps page X as L1 pagetable
> >>- type of page X changes to PGT_l1_page_table
> >>- TLB flush is forced using domain_dirty_cpumask of domain B
> >>- page X is mapped as L1 pagetable in domain B
> >> 
> >>  At this point, domain B's vCPUs are guaranteed to have no
> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
> >>  vCPUs can still have stale TLB entries that map page X as writable,
> >>  permitting domain A to control a live pagetable of domain B."
> > 
> > AIUI this patch solves the problem by immediately flushing domain A's
> > TLB entries at the point where domain A removes its mapping of page X.
> > 
> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> > domain B's domain_dirty_cpumask at the same point?
> > 
> > Then when domain B flushes TLBs in the last step (in __get_page_type())
> > it will catch any stale TLB entries from domain A as well.  But in the
> > (hopefully common) case where there's a delay between domain A's
> > __put_page_type() and domain B's __get_page_type(), the usual TLB
> > timestamp filtering will suppress some of the IPIs/flushes.
> 
> So I've given this a try, and failed miserably (including losing an
> XFS volume on the test machine). The problem is the BUG_ON() at
> the top of domain_relinquish_resources() - there will, very likely, be
> bits remaining set if the code added to put_page_from_l1e() set
> some pretty recently (irrespective of avoiding to set any once
> ->is_dying has been set).

Yeah. :(  Would it be correct to just remove that BUG_ON(), or replace it
with an explicit check that there are no running vcpus?

Or is using domain_dirty_cpumask like this too much of a stretch?
E.g. PV TLB flushes use it, and would maybe be more expensive until
the dom0 CPUs fall out of the mask (which isn't guaranteed to happen).

We could add a new mask just for this case, and clear CPUs from it as
they're flushed.  But that sounds like a lot of work...

Maybe worth measuring the impact of the current patch before going too
far with this?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-04-26 Thread Tim Deegan

At 05:59 -0600 on 25 Apr (1493099950), Jan Beulich wrote:
> >>> On 25.04.17 at 12:59,  wrote:
> > Hi,
> > 
> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> >> Jann's explanation of the problem:
> >> 
> >> "start situation:
> >>  - domain A and domain B are PV domains
> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
> >>are not scheduled away
> >>  - domain A has XSM_TARGET access to domain B
> >>  - page X is owned by domain B and has no mappings
> >>  - page X is zeroed
> >> 
> >>  steps:
> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
> >>  - domain A accesses page X through the new PTE, creating a TLB entry
> >>  - domain A removes its mapping of page X
> >>- type count of page X goes to 0
> >>- tlbflush_timestamp of page X is bumped
> >>  - domain B maps page X as L1 pagetable
> >>- type of page X changes to PGT_l1_page_table
> >>- TLB flush is forced using domain_dirty_cpumask of domain B
> >>- page X is mapped as L1 pagetable in domain B
> >> 
> >>  At this point, domain B's vCPUs are guaranteed to have no
> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
> >>  vCPUs can still have stale TLB entries that map page X as writable,
> >>  permitting domain A to control a live pagetable of domain B."
> > 
> > AIUI this patch solves the problem by immediately flushing domain A's
> > TLB entries at the point where domain A removes its mapping of page X.
> > 
> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> > domain B's domain_dirty_cpumask at the same point?
> > 
> > Then when domain B flushes TLBs in the last step (in __get_page_type())
> > it will catch any stale TLB entries from domain A as well.  But in the
> > (hopefully common) case where there's a delay between domain A's
> > __put_page_type() and domain B's __get_page_type(), the usual TLB
> > timestamp filtering will suppress some of the IPIs/flushes.
> 
> Oh, I see. Yes, I think this would be fine. However, we don't have
> a suitable cpumask accessor allowing us to do this ORing atomically,
> so we'd have to open code it.

Probably better to build the accessor than to open code here. :)

> Do you think such a slightly ugly approach would be worth it here?
> Foreign mappings shouldn't be _that_ performance critical..

I have no real idea, though there are quite a lot of them in domain
building/migration.  I can imagine a busy multi-vcpu dom0 could
generate a lot of IPIs, almost all of which could be merged.

> And then, considering that this will result in time stamp based filtering
> again, I'm no longer sure I was right to agree with Jann on the flush
> here needing to be unconditional. Regardless of page table owner
> matching page owner, the time stamp stored for the page will always
> be applicable (it's a global property). So we wouldn't even need to
> OR in the whole dirty mask here, but could already pre-filter (or if we
> stayed with the flush-on-put approach, then v1 would have been
> correct).

I don't think so.  The page's timestamp is set when its typecount
falls to zero, which hasn't happened yet -- we hold a typecount
ourselves here.

In theory we could filter the bits we're adding against a local
timestamp, but that would have to be tlbflush_current_time()
because the TLB entries we care about are live right now, and
filtering against that is (probably) a noop.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

2017-04-25 Thread Tim Deegan

Hi,

At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> Jann's explanation of the problem:
> 
> "start situation:
>  - domain A and domain B are PV domains
>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>are not scheduled away
>  - domain A has XSM_TARGET access to domain B
>  - page X is owned by domain B and has no mappings
>  - page X is zeroed
> 
>  steps:
>  - domain A uses do_mmu_update() to map page X in domain A as writable
>  - domain A accesses page X through the new PTE, creating a TLB entry
>  - domain A removes its mapping of page X
>- type count of page X goes to 0
>- tlbflush_timestamp of page X is bumped
>  - domain B maps page X as L1 pagetable
>- type of page X changes to PGT_l1_page_table
>- TLB flush is forced using domain_dirty_cpumask of domain B
>- page X is mapped as L1 pagetable in domain B
> 
>  At this point, domain B's vCPUs are guaranteed to have no
>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>  vCPUs can still have stale TLB entries that map page X as writable,
>  permitting domain A to control a live pagetable of domain B."

AIUI this patch solves the problem by immediately flushing domain A's
TLB entries at the point where domain A removes its mapping of page X.

Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
domain B's domain_dirty_cpumask at the same point?

Then when domain B flushes TLBs in the last step (in __get_page_type())
it will catch any stale TLB entries from domain A as well.  But in the
(hopefully common) case where there's a delay between domain A's
__put_page_type() and domain B's __get_page_type(), the usual TLB
timestamp filtering will suppress some of the IPIs/flushes.

Cheers,

Tim.


> Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
> being created only for HVM domains, but domain B needs to be PV here),
> so this is not a security issue, but nevertheless seems desirable to
> correct.
> 
> Reported-by: Jann Horn 
> Signed-off-by: Jan Beulich 
> ---
> v2: Don't consider page's time stamp (relevant only for the owning
> domain).
> 
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1266,6 +1266,18 @@ void put_page_from_l1e(l1_pgentry_t l1e,
>  if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
>   ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
>  {
> +/*
> + * Don't leave stale writable TLB entries in the unmapping domain's
> + * page tables, to prevent them allowing access to pages required to
> + * be read-only (e.g. after pg_owner changed them to page table or
> + * segment descriptor pages).
> + */
> +if ( unlikely(l1e_owner != pg_owner) )
> +{
> +perfc_incr(need_flush_tlb_flush);
> +flush_tlb_mask(l1e_owner->domain_dirty_cpumask);
> +}
> +
>  put_page_and_type(page);
>  }
>  else

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 for-4.9 1/6] x86/hvm: Correct some address space terminology

2017-04-06 Thread Tim Deegan

At 18:32 +0100 on 05 Apr (1491417179), Andrew Cooper wrote:
> The function hvm_translate_linear_addr() translates a virtual address to a
> linear address, not a linear address to a physical address.  Correct its name.
> 
> Both hvm_translate_virtual_addr() and hvmemul_virtual_to_linear() return a
> linear address, but a parameter name of paddr is easily confused with paddr_t.
> Rename it to linear, to clearly identify the address space, and for
> consistency with hvm_virtual_to_linear_addr().
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Paul Durrant 

For the whole series, Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 5/6] x86/pagewalk: Improve the logic behind setting access and dirty bits

2017-03-27 Thread Tim Deegan

At 11:03 +0100 on 27 Mar (1490612603), Andrew Cooper wrote:
> The boolean pse2M is misnamed, because it might refer to a 4M superpage.
> 
> Switch the logic to be in terms of the level of the leaf entry, and rearrange
> the calls to set_ad_bits() to be a fallthrough switch statement, to make it
> easier to follow.
> 
> Alter set_ad_bits() to take properly typed pointers and booleans rather than
> integers.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling

2017-03-23 Thread Tim Deegan

At 17:02 + on 23 Mar (1490288548), Andrew Cooper wrote:
> On 23/03/17 16:55, Tim Deegan wrote:
> > At 16:31 + on 16 Mar (1489681899), Andrew Cooper wrote:
> >> Some bits are unconditionally reserved in pagetable entries, or reserved
> >> because of alignment restrictions.  Other bits are reserved because of 
> >> control
> >> register configuration.
> >>
> >> Introduce helpers which take an individual vcpu and guest pagetable entry, 
> >> and
> >> calculates whether any reserved bits are set.
> >>
> >> While here, add a couple of newlines to aid readability.
> >>
> >> Signed-off-by: Andrew Cooper 
> > Reviewed-by: Tim Deegan , although:
> >
> >> +/* Mask covering the reserved bits from superpage alignment. */
> >> +#define SUPERPAGE_RSVD(bit) \
> >> +(((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
> > I think this will be wrong if we ever get l4 superpages, as the mask
> > is only 32 bits wide.
> 
> What is 32 bits wide?  1ULL should cause everything else to be suitably
> promoted, no?


~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xe000.  Promotion comes
too late.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 9/9] x86/pagewalk: non-functional cleanup

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681903), Andrew Cooper wrote:
>  * Drop trailing whitespace
>  * Consistently apply Xen style
>  * Introduce a local variable block
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681902), Andrew Cooper wrote:
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -32,24 +32,28 @@ asm(".file \"" __OBJECT_FILE__ "\"");
>  #include 
>  #include 
>  
> -/* Modify a guest pagetable entry to set the Accessed and Dirty bits.
> - * Returns non-zero if it actually writes to guest memory. */
> -static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
> +/*
> + * Modify a guest pagetable entry to set the Accessed and Dirty bits.
> + * Returns non-zero if it actually writes to guest memory.

s/non-zero/true/.

> + */
> +static bool set_ad_bits(void *_guest_p, void *_walk_p, bool set_dirty)

Would it work to have this take guest_intpte_t * and have the caller
pass e.g. &l4p[guest_l4_table_offset(va)].l4 ?  Not very much prettier, I
suppose. :)

>  {
> -guest_intpte_t old, new;
> +guest_intpte_t *guest_p = _guest_p, *walk_p = _walk_p;
> +guest_intpte_t new, old = ACCESS_ONCE(*walk_p);

Why ACCESS_ONCE?  The walk is unlikely to change underfoot.

>  
> -old = *(guest_intpte_t *)walk_p;
>  new = old | _PAGE_ACCESSED | (set_dirty ? _PAGE_DIRTY : 0);
> -if ( old != new ) 
> +if ( old != new )
>  {
> -/* Write the new entry into the walk, and try to write it back
> +/*
> + * Write the new entry into the walk, and try to write it back
>   * into the guest table as well.  If the guest table has changed
> - * under out feet then leave it alone. */
> -*(guest_intpte_t *)walk_p = new;
> -if ( cmpxchg(((guest_intpte_t *)guest_p), old, new) == old ) 
> -return 1;
> + * under out feet then leave it alone.

s/out/our/ while we're here. 

The rest of this LGTM, thanks.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681901), Andrew Cooper wrote:
> The shadow logic should not create a valid/present shadow of a guest PTE which
> contains reserved bits from the guests point of view.  It is not guaranteed
> that the hardware pagewalk will come to the same conclusion, and raise a
> pagefault.
> 
> Shadows created on demand from the pagefault handler are fine because the
> pagewalk over the guest tables will have injected the fault into the guest
> rather than creating a shadow.
> 
> However, shadows created by sh_resync_l1() and sh_prefetch() haven't undergone
> a pagewalk and need to account for reserved bits before creating the shadow.
> 
> In practice, this means a 3-level guest could previously cause PTEs with bits
> 63:52 set to be shadowed (and discarded).  This PTE should cause #PF[RSVD]
> when encountered by hardware, but the installed shadow is valid and hardware
> doesn't fault.
> 
> Reuse the pagewalk reserved bits helpers, and assert in
> l?e_propagate_from_guest() that shadows are not attempted to be created with
> reserved bits set.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 
(including your later adjustment to p2mt)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681899), Andrew Cooper wrote:
> Some bits are unconditionally reserved in pagetable entries, or reserved
> because of alignment restrictions.  Other bits are reserved because of control
> register configuration.
> 
> Introduce helpers which take an individual vcpu and guest pagetable entry, and
> calculates whether any reserved bits are set.
> 
> While here, add a couple of newlines to aid readability.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan , although:

> +/* Mask covering the reserved bits from superpage alignment. */
> +#define SUPERPAGE_RSVD(bit) \
> +(((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))

I think this will be wrong if we ever get l4 superpages, as the mask
is only 32 bits wide.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681898), Andrew Cooper wrote:
> Switch them to returning bool, and taking const parameters.
> 
> Rename guest_supports_superpages() to guest_supports_l2_superpages() to
> indicate which level of pagetables it is actually referring to, and rename
> guest_supports_1G_superpages() to guest_supports_l3_superpages() for
> consistency.
> 
> guest_supports_l3_superpages() is a static property of the domain, rather than
> control register settings, so is switched to take a domain pointer.
> hvm_pse1gb_supported() is inlined into its sole user because it isn't strictly
> hvm-specific (it is hap-specific) and really should be beside a comment
> explaining why the cpuid policy is ignored.
> 
> While cleaning up part of the file, clean up all trailing whilespace, and fix
> one comment which accidently refered to PG living in CR4 rather than CR0.
> 
> Requested-by: Jan Beulich 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 
(with or without the renaming Jan asked for).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/9] x86/shadow: Drop VALID_GFN()

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681897), Andrew Cooper wrote:
> There is only one single user of VALID_GFN().  Inline the macro to remove the
> added layer of indirection in sh_gva_to_gfn()
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter

2017-03-23 Thread Tim Deegan

At 16:31 + on 16 Mar (1489681896), Andrew Cooper wrote:
> It is a pointer, not an array.
> 
> No functional change.
> 
> Requested-by: Jan Beulich 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] tools/kdd: don't use a pointer to an unaligned field.

2017-03-10 Thread Tim Deegan

The 'val' field in the packet is byte-aligned (because it is part of a
packed struct), but the pointer argument to kdd_rdmsr() has the normal
alignment constraints for a uint64_t *.  Use a local variable to make sure
the passed pointer has the correct alignment.

Reported-by: Roger Pau Monné 
Signed-off-by: Tim Deegan 
---
 tools/debugger/kdd/kdd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/debugger/kdd/kdd.c b/tools/debugger/kdd/kdd.c
index 70f007e..1bd5dd5 100644
--- a/tools/debugger/kdd/kdd.c
+++ b/tools/debugger/kdd/kdd.c
@@ -710,11 +710,13 @@ static void kdd_handle_read_ctrl(kdd_state *s)
 static void kdd_handle_read_msr(kdd_state *s)
 {
 uint32_t msr = s->rxp.cmd.msr.msr;
+uint64_t val;
 int ok;
 KDD_LOG(s, "Read MSR 0x%"PRIx32"\n", msr);
 
-ok = (kdd_rdmsr(s->guest, s->cpuid, msr, &s->txp.cmd.msr.val) == 0);
+ok = (kdd_rdmsr(s->guest, s->cpuid, msr, &val) == 0);
 s->txp.cmd.msr.msr = msr;
+s->txp.cmd.msr.val = val;
 s->txp.cmd.msr.status = (ok ? KDD_STATUS_SUCCESS : KDD_STATUS_FAILURE);
 kdd_send_cmd(s, KDD_CMD_READ_MSR, 0);
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] tools/kdd: remove unneeded packed attributes

2017-03-10 Thread Tim Deegan

Hi,

At 12:02 +0900 on 10 Mar (1489147353), Roger Pau Monne wrote:
> The layout of the structures make them already aligned, there's no need for 
> the
> packed attributes.

That's not right -- kdd_pkt gets 8 bytes longer with this patch.

In fact this code has the bug that clang's new warning is supposed to
catch: we use these packed fields to parse byte-aligned packets out
of a stream, and then pass the address of a 64-bit field to an
external function as a uint64_t *.  I'll send a patch.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] build/clang: remove the address-of-packed-member warning

2017-03-06 Thread Tim Deegan

At 14:36 + on 06 Mar (1488811016), George Dunlap wrote:
> On 06/03/17 13:58, Jan Beulich wrote:
>  On 06.03.17 at 13:31,  wrote:
> >> --- a/Config.mk
> >> +++ b/Config.mk
> >> @@ -216,6 +216,7 @@ $(call 
> >> cc-option-add,HOSTCFLAGS,HOSTCC,-Wdeclaration-after-statement)
> >>  $(call cc-option-add,CFLAGS,CC,-Wdeclaration-after-statement)
> >>  $(call cc-option-add,CFLAGS,CC,-Wno-unused-but-set-variable)
> >>  $(call cc-option-add,CFLAGS,CC,-Wno-unused-local-typedefs)
> >> +$(call cc-option-add,CFLAGS,CC,-Wno-address-of-packed-member)
> > 
> > Actually, having thought some more about this, the warning
> > should be suppressed only for x86 imo. ARM wants aligned
> > accesses after all.
> 
> Looking at Roger's complaint, it appears that the warning is issued even
> if the member actually is aligned, if *on some unknown system*, it might
> someday be un-aligned.

AIUI the complaint is (based on the simplified example from the ticket):

struct __attribute__((__packed__)) bar {
uint16_t x1;
uint16_t x2;
} b;

&b.x2;

Because the struct is packed, it has alignment 1, and so do its
fields.   &b.x2 is a pointer to a uint16_t, but it _isn't_ 16-bit
aligned (because the whole struct is only byte-aligned).

So I guess that one fix would be to declare that the struct has
appropriate alignment?  I don't know whether that would suppress the
warning, but the clang devs might be more receptive to seeing it as
a false positive.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 5/7] x86/shadow: Use the pagewalk reserved bits helpers

2017-03-06 Thread Tim Deegan

At 12:26 + on 02 Mar (1488457613), Andrew Cooper wrote:
> On 01/03/17 16:03, Jan Beulich wrote:
>  On 27.02.17 at 15:03,  wrote:
> >> The shadow logic should never create a shadow of a guest PTE which contains
> >> reserved bits from the guests point of view.  Such a shadowed entry might 
> >> not
> >> cause #PF[RSVD] when walked by hardware, thus won't behave architecturally
> >> from the guests point of view.
> > But are we already or-ing in the RSVD bit accordingly in such cases,
> > before handing the #PF to the guest? The patch here certainly
> > doesn't make any change towards that, afaics.
> 
> The purpose of this patch is to ensure we never create a shadow which
> risks causing hardware to generate #PF[RSVD] when running on the
> shadows, other than the one deliberate case (MMIO fastpath).

Confusion! AIUI:

 - Shadows installed on demand in the pagefault handler are already
   correct.  If the guest PTE contained invalid bits we'd have injected
   a fault instead of shadowing it.

 - There is no risk of accidentally installing a shadow with reserved
   bits in it even if the guest pte has reserved bits in it.
   _sh_propagate() sanity-checks the flags, and the address bits
   come from the MFN (IOW we'd need a buggy p2m entry).  If that were
   a risk, I don't think this patch would solve it.

 - The potential bug that this patch tries to fix is:
   1. Guest writes a PTE with reserved bits in it.
   2. That gets shadowed by a write-to-pagetable path or a prefetch.
   3. The shadow is a valid PTE, so the guest gets no #PF, instead
  of #PF(rsvd).

Now by the same logic I used above there's probably no path
where a reserved _address_ bit causes a problem, but I see no harm
in centralising the logic and using the same code for these
paths as for the pt walker.

In answering this, I've spotted that the calls to
l1e_propagate_from_guest() in sh_resync_l1() and sh_prefetch()
aren't updated in this patch and should be. 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 7/7] x86/pagewalk: Re-implement the pagetable walker

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204198), Andrew Cooper wrote:
> The existing pagetable walker has complicated return semantics, which squeeze
> multiple pieces of information into single integer.  This would be fine if the
> information didn't overlap, but it does.
> 
> Specifically, _PAGE_INVALID_BITS for 3-level guests alias _PAGE_PAGED and
> _PAGE_SHARED.  A guest which constructs a PTE with bits 52 or 53 set (the
> start of the upper software-available range) will create a virtual address
> which, when walked by Xen, tricks Xen into believing the frame is paged or
> shared.  This behaviour was introduced by XSA-173 (c/s 8b17648).
> 
> It is also complicated to turn rc back into a normal pagefault error code.
> Instead, change the calling semantics to return a boolean indicating success,
> and have the function accumulate a real pagefault error code as it goes
> (including synthetic error codes, which do not alias hardware ones).  This
> requires an equivalent adjustment to map_domain_gfn().
> 
> Issues fixed:
>  * 2-level PSE36 superpages now return the correct translation.
>  * 2-level L2 superpages without CR0.PSE now return the correct translation.
>  * SMEP now inhibits a user instruction fetch even if NX isn't active.
>  * Supervisor writes without CR0.WP now set the leaf dirty bit.
>  * L4e._PAGE_GLOBAL is strictly reserved on AMD.
>  * 3-level l3 entries have all reserved bits checked.
>  * 3-level entries can no longer alias Xen's idea of paged or shared.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/7] x86/shadow: Try to correctly identify implicit supervisor accesses

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204193), Andrew Cooper wrote:
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 6/7] x86/pagewalk: Consistently use guest_walk_*() helpers for translation

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204197), Andrew Cooper wrote:
> hap_p2m_ga_to_gfn() and sh_page_fault() currently use guest_l1e_get_gfn() to
> obtain the translation of a pagewalk.  This is conceptually wrong (the
> semantics of gw.l1e is an internal detail), and will actually be wrong when
> PSE36 superpage support is fixed.  Switch them to using guest_walk_to_gfn().
> 
> Take the opportunity to const-correct the walk_t parameter of the
> guest_walk_to_*() helpers, and implement guest_walk_to_gpa() in terms of
> guest_walk_to_gfn() to avoid duplicating the actual translation calculation.
> 
> While editing guest_walk_to_gpa(), fix a latent bug by causing it to return
> INVALID_PADDR rather than 0 for a failed translation, as 0 is also a valid
> successful result.  The sole caller, sh_page_fault(), has already confirmed
> that the translation is valid, so this doesn't cause a behavioural change.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/7] x86/pagewalk: Helpers for reserved bit handling

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204194), Andrew Cooper wrote:
> Some bits are unconditionally reserved in pagetable entries, or reserved
> because of alignment restrictions.  Other bits are reserved because of control
> register configuration.
> 
> Introduce helpers which take an individual vcpu and guest pagetable entry, and
> calculates whether any reserved bits are set.
> 
> While here, add a couple of newlines to aid readability, drop some trailing
> whitespace and bool/const correct the existing helpers to allow the new
> helpers to take const vcpu pointers.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/7] x86/hvm: Correctly identify implicit supervisor accesses

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204192), Andrew Cooper wrote:
> All actions which refer to the active ldt/gdt/idt or task register
> (e.g. loading a new segment selector) are known as implicit supervisor
> accesses, even when the access originates from user code.
> 
> The distinction is necessary in the pagewalk when SMAP is enabled.  Refer to
> Intel SDM Vol 3 "Access Rights" for the exact details.
> 
> Introduce a new pagewalk input, and make use of the new system segment
> references in hvmemul_{read,write}().  While modifying those areas, move the
> calculation of the appropriate pagewalk input before its first use.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/7] x86/pagewalk: Helpers for reserved bit handling

2017-03-02 Thread Tim Deegan

At 14:17 + on 02 Mar (1488464221), Andrew Cooper wrote:
> On 02/03/17 14:12, Tim Deegan wrote:
> > At 14:03 + on 27 Feb (1488204194), Andrew Cooper wrote:
> >> +static inline bool guest_has_pse36(const struct vcpu *v)
> >> +{
> >> + /* No support for 2-level PV guests. */
> >> +return is_pv_vcpu(v) ? 0 : paging_mode_hap(v->domain);
> >> +}
> > Should this check the CPUID policy to see whether PSE36 is supported?
> > There's no way to disable it in a HAP guest anyway, so maybe not, for
> > consistency?
> 
> Hmm - perhaps I need to rethink this slightly.
> 
> CR4.PSE controls this part of the pagewalk, which we can control with
> CPUID checks.  However, if CR4.PSE it set, we cannot hide hardware’s
> preference of PSE36 from the guest, and in reality it will always be
> present.

Yeah, I don't think there's anything you can do here.  You can hide
PSE36 from CPUID but a guest that _relies_ on PSE36 _not_ being supported
is going to fail on HAP.

I guess you could force PSE36 to be present in CPUID for HAP guetsts
and absent for PV and shadowed geuests...

In any case I thin this patch is correct as far as it goes.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 5/7] x86/shadow: Use the pagewalk reserved bits helpers

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204196), Andrew Cooper wrote:
> The shadow logic should never create a shadow of a guest PTE which contains

Nit: a _valid/present_ shadow.

> reserved bits from the guests point of view.  Such a shadowed entry might not
> cause #PF[RSVD] when walked by hardware, thus won't behave architecturally
> from the guests point of view.

The shadow doesn't have to cause #PF(rsvd), any pagefault will do, so
long as the shadow pagefault handler turns it into rsvd.  This patch
uses shadow_l1e_empty(), which doesn't cause #PF(rsvd), and is fine.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/7] x86/pagewalk: Helpers for reserved bit handling

2017-03-02 Thread Tim Deegan

At 14:03 + on 27 Feb (1488204194), Andrew Cooper wrote:
> +static inline bool guest_has_pse36(const struct vcpu *v)
> +{
> + /* No support for 2-level PV guests. */
> +return is_pv_vcpu(v) ? 0 : paging_mode_hap(v->domain);
> +}

Should this check the CPUID policy to see whether PSE36 is supported?
There's no way to disable it in a HAP guest anyway, so maybe not, for
consistency?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: Fix build with CONFIG_SHADOW_PAGING=n following c/s 45ac805

2017-02-27 Thread Tim Deegan

At 11:47 + on 27 Feb (1488196049), Andrew Cooper wrote:
> c/s 45ac805 "x86/paging: Package up the log dirty function pointers" neglected
> the case when CONFIG_SHADOW_PAGING is disabled.  Make a similar adjustment to
> the none stubs.
> 
> Spotted by a Travis RANDCONFIG run.
> 
> Signed-off-by: Andrew Cooper 
> ---

Acked-by:  Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: remove has_hvm_container_{domain/vcpu}

2017-02-24 Thread Tim Deegan

At 15:13 + on 24 Feb (1487949198), Roger Pau Monne wrote:
> It is now useless since PVHv1 is removed and PVHv2 is a HVM domain from Xen's
> point of view.
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/VMX: sanitize VM86 TSS handling

2017-02-20 Thread Tim Deegan

At 04:17 -0700 on 20 Feb (1487564245), Jan Beulich wrote:
> >>> On 20.02.17 at 12:01,  wrote:
> > At 05:03 -0700 on 17 Feb (1487307837), Jan Beulich wrote:
> >> The present way of setting this up is flawed: Leaving the I/O bitmap
> >> pointer at zero means that the interrupt redirection bitmap lives
> >> outside (ahead of) the allocated space of the TSS. Similarly setting a
> >> TSS limit of 255 when only 128 bytes get allocated means that 128 extra
> >> bytes may be accessed by the CPU during I/O port access processing.
> >> 
> >> Introduce a new HVM param to set the allocated size of the TSS, and
> >> have the hypervisor actually take care of setting namely the I/O bitmap
> >> pointer. Both this and the segment limit now take the allocated size
> >> into account.
> >> 
> >> Signed-off-by: Jan Beulich 
> >> ---
> >> v2: Instead of HVM_PARAM_VM86_TSS_SIZE, introduce
> >> HVM_PARAM_VM86_TSS_SIZED, which requires the old parameter to no
> >> longer be saved in libxc's write_hvm_params(). Only initialize the
> >> TSS once after the param was set. Request only 384 bytes (and
> >> 128-byte alignment) for the TSS. Add padding byte to capping value.
> >> Add comment to hvm_copy_to_guest_phys() invocations.
> > 
> > This still seems like it has too many moving parts -- why not just
> > declare the top half of the existing param to be the size, interpret
> > size==0 as size==128, and init the contents when the param is written?
> 
> I would have done that if the parameters and their hypercall function
> were tools only (and hence we could freely change their behavior).
> Also, since we wouldn't be able to tell size 128 from size 0 with what
> you propose, "get" would then possibly return a value different from
> what was passed to "set".

Sure you could - just keep the client's version and s/0/128/ when
setting up the TSS.  But let's not bikeshed this any further.

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/paging: Package up the log dirty function pointers

2017-02-20 Thread Tim Deegan

At 17:13 + on 16 Feb (1487265184), Andrew Cooper wrote:
> They depend soley on paging mode, so don't need to be repeated per domain, and
> can live in .rodata.  While making this change, drop the redundant log_dirty
> from the function pointer names.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/VMX: sanitize VM86 TSS handling

2017-02-20 Thread Tim Deegan

At 05:03 -0700 on 17 Feb (1487307837), Jan Beulich wrote:
> The present way of setting this up is flawed: Leaving the I/O bitmap
> pointer at zero means that the interrupt redirection bitmap lives
> outside (ahead of) the allocated space of the TSS. Similarly setting a
> TSS limit of 255 when only 128 bytes get allocated means that 128 extra
> bytes may be accessed by the CPU during I/O port access processing.
> 
> Introduce a new HVM param to set the allocated size of the TSS, and
> have the hypervisor actually take care of setting namely the I/O bitmap
> pointer. Both this and the segment limit now take the allocated size
> into account.
> 
> Signed-off-by: Jan Beulich 
> ---
> v2: Instead of HVM_PARAM_VM86_TSS_SIZE, introduce
> HVM_PARAM_VM86_TSS_SIZED, which requires the old parameter to no
> longer be saved in libxc's write_hvm_params(). Only initialize the
> TSS once after the param was set. Request only 384 bytes (and
> 128-byte alignment) for the TSS. Add padding byte to capping value.
> Add comment to hvm_copy_to_guest_phys() invocations.

This still seems like it has too many moving parts -- why not just
declare the top half of the existing param to be the size, interpret
size==0 as size==128, and init the contents when the param is written?

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Swap mfn_valid() to use mfn_t

2017-02-20 Thread Tim Deegan

At 20:07 + on 16 Feb (1487275634), Andrew Cooper wrote:
> Replace one opencoded mfn_eq() and some coding style issues on altered lines.
> Swap __mfn_valid() to being bool, although it can't be updated to take mfn_t
> because of include dependencies.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] x86/shadow: Correct guest behaviour when creating PTEs above maxphysaddr

2017-02-20 Thread Tim Deegan

At 15:45 + on 16 Feb (1487259954), Andrew Cooper wrote:
> XSA-173 (c/s 8b1764833) introduces gfn_bits, and an upper limit which might be
> lower than the real maxphysaddr, to avoid overflowing the superpage shadow
> backpointer.
> 
> However, plenty of hardware has a physical address width less that 44 bits,
> and the code added in shadow_domain_init() is a straight assignment.  This
> causes gfn_bits to be increased beyond the physical address width on most
> Intel consumer hardware (typically a width of 39, which is the number reported
> to the guest via CPUID).
> 
> If the guest intentionally creates a PTE referencing a physical address
> between 39 and 44 bits, the result should be #PF[RSVD] for using the virtual
> address.  However, the shadow code accepts the PTE, shadows it, and the
> virtual address works normally.
> 
> Introduce paging_max_paddr_bits() to calculate the largest guest physical
> address supportable by the paging infrastructure, and update
> recalculate_cpuid_policy() to take this into account when clamping the guests
> cpuid_policy to reality.
> 
> There is an existing gfn_valid() in guest_pt.h but it is unused in the
> codebase.  Repurpose it to perform a guest-specific maxphysaddr check, which
> replaces the users of gfn_bits.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/VMX: sanitize VM86 TSS handling

2017-02-20 Thread Tim Deegan

At 04:49 -0700 on 16 Feb (1487220558), Jan Beulich wrote:
> >>> On 16.02.17 at 12:14,  wrote:
>  On 15.02.17 at 12:21,  wrote:
> >> At 01:18 -0700 on 15 Feb (1487121525), Jan Beulich wrote:
> >>> >>> On 14.02.17 at 18:33,  wrote:
> >>> >> TBD: Do we really want to re-init the TSS every time we are about to
> >>> >>  use it?
> >>> > 
> >>> > No - I think we should init it when the guest writes the param(s) and
> >>> > leave it at that.  Hvmloader marks it as reserved so the guest should
> >>> > know better than to write to it, and we can't protect it against all
> >>> > the possible ways the guest could break itself.
> >>> > 
> >>> > If you do want to re-init it more often, then I think it would still
> >>> > be better to legacy guests' (lack of a) size limit once, when the guest
> >>> > writes the base param.
> >>> 
> >>> The only problem with this being that at the time the base gets
> >>> written we don't know the size yet (nor whether the guest is
> >>> going to write it), and hence we don't know how must space to
> >>> initialize. The lower limit we enforce on the size (if being set) is
> >>> below the 128 byte default for old guests.
> >> 
> >> Why not make the lower limit 128?  I'd happily exchange simpler
> >> hypervisor code for the theoretical case of a guest that needs to run
> >> realmode code and cares about those few bytes.
> > 
> > Actually there's one more issue with doing it when the parameter is
> > being set: If a guest wanted to move the TSS (the parameter isn't
> > one-shot after all), we can't use the old value of the other parameter
> > when the first of the two is being changed. Of course we could make
> > both parameters one-shot, but this would again seem arbitrary. So I
> > think the better model is to record when either parameter changed,
> > and do the initialization just once after that.
> 
> Actually no, this wouldn't work either, for the same reason (we might
> again be using an inconsistent pair of parameters). Which makes me
> come back to my original plan (not mentioned anywhere so far): 
> Instead of a new size param, we need one which allows setting both
> size and address at the same time. Since the address needs to be
> below 4Gb anyway, we could simply use the high half of the 64-bit
> value to hold the size.

Sure, that seems like it avoids a lot of potential pitfalls.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/VMX: sanitize VM86 TSS handling

2017-02-15 Thread Tim Deegan

At 01:18 -0700 on 15 Feb (1487121525), Jan Beulich wrote:
> >>> On 14.02.17 at 18:33,  wrote:
> >> TBD: Do we really want to re-init the TSS every time we are about to
> >>  use it?
> > 
> > No - I think we should init it when the guest writes the param(s) and
> > leave it at that.  Hvmloader marks it as reserved so the guest should
> > know better than to write to it, and we can't protect it against all
> > the possible ways the guest could break itself.
> > 
> > If you do want to re-init it more often, then I think it would still
> > be better to legacy guests' (lack of a) size limit once, when the guest
> > writes the base param.
> 
> The only problem with this being that at the time the base gets
> written we don't know the size yet (nor whether the guest is
> going to write it), and hence we don't know how must space to
> initialize. The lower limit we enforce on the size (if being set) is
> below the 128 byte default for old guests.

Why not make the lower limit 128?  I'd happily exchange simpler
hypervisor code for the theoretical case of a guest that needs to run
realmode code and cares about those few bytes.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/VMX: sanitize VM86 TSS handling

2017-02-15 Thread Tim Deegan

At 01:13 -0700 on 15 Feb (1487121231), Jan Beulich wrote:
> >>> On 14.02.17 at 18:35,  wrote:
> > At 06:37 -0700 on 13 Feb (1486967832), Jan Beulich wrote:
> >> >>> On 13.02.17 at 14:19,  wrote:
> >> > -tss = mem_alloc(128, 128);
> >> > -memset(tss, 0, 128);
> >> > +tss = mem_alloc(TSS_SIZE, TSS_SIZE);
> >> 
> >> tss = mem_alloc(TSS_SIZE, 128);
> >> 
> >> is sufficient here, as I've noticed (only) while reviewing Roger's
> >> series v4 of which did trigger the creation of this patch. I've made
> >> the change locally for now.
> > 
> > Should Xen check the alignment when the param gets written?
> 
> I did think about this too, but then decided not to, since the guest
> would only shoot itself in the foot (the more that in non-root mode
> no actual task switching by the hardware occurs, so the alignment
> requirement is pretty theoretical anyway).

Righto.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/VMX: sanitize VM86 TSS handling

2017-02-14 Thread Tim Deegan

At 06:37 -0700 on 13 Feb (1486967832), Jan Beulich wrote:
> >>> On 13.02.17 at 14:19,  wrote:
> > -tss = mem_alloc(128, 128);
> > -memset(tss, 0, 128);
> > +tss = mem_alloc(TSS_SIZE, TSS_SIZE);
> 
> tss = mem_alloc(TSS_SIZE, 128);
> 
> is sufficient here, as I've noticed (only) while reviewing Roger's
> series v4 of which did trigger the creation of this patch. I've made
> the change locally for now.

Should Xen check the alignment when the param gets written?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/VMX: sanitize VM86 TSS handling

2017-02-14 Thread Tim Deegan

Hi,

At 06:19 -0700 on 13 Feb (1486966797), Jan Beulich wrote:
> The present way of setting this up is flawed: Leaving the I/O bitmap
> pointer at zero means that the interrupt redirection bitmap lives
> outside (ahead of) the allocated space of the TSS. Similarly setting a
> TSS limit of 255 when only 128 bytes get allocated means that 128 extra
> bytes may be accessed by the CPU during I/O port access processing.
> 
> Introduce a new HVM param to set the allocated size of the TSS, and
> have the hypervisor actually take care of setting namely the I/O bitmap
> pointer. Both this and the segment limit now take the allocated size
> into account.
> 
> Signed-off-by: Jan Beulich 

This looks pretty good to me.  Once the question below is resolved,
Reviewed-by: Tim Deegan 

> ---
> TBD: Do we really want to re-init the TSS every time we are about to
>  use it?

No - I think we should init it when the guest writes the param(s) and
leave it at that.  Hvmloader marks it as reserved so the guest should
know better than to write to it, and we can't protect it against all
the possible ways the guest could break itself.

If you do want to re-init it more often, then I think it would still
be better to legacy guests' (lack of a) size limit once, when the guest
writes the base param.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] x86/shadow: Correct guest behaviour when creating PTEs above maxphysaddr

2017-02-14 Thread Tim Deegan

At 16:04 + on 14 Feb (1487088291), Andrew Cooper wrote:
> Hmm ok.  With the other bugfix of not dropping the first line, this hunk
> is now simply:
> 
> @@ -504,7 +505,7 @@ void recalculate_cpuid_policy(struct domain *d)
>  
>  p->extd.maxphysaddr = min(p->extd.maxphysaddr, max->extd.maxphysaddr);
>  p->extd.maxphysaddr = min_t(uint8_t, p->extd.maxphysaddr,
> -d->arch.paging.gfn_bits + PAGE_SHIFT);
> +paging_max_paddr_bits(d));
>  p->extd.maxphysaddr = max_t(uint8_t, p->extd.maxphysaddr,
>  (p->basic.pae || p->basic.pse36) ? 36 :
> 32);

That looks good to me.  Reviewed-by: Tim Deegan 

> >> @@ -360,6 +361,21 @@ void paging_dump_vcpu_info(struct vcpu *v);
> >>  int paging_set_allocation(struct domain *d, unsigned int pages,
> >>bool *preempted);
> >>  
> >> +/* Maxphysaddr supportable by the paging infrastructure. */
> >> +static inline unsigned int paging_max_paddr_bits(const struct domain *d)
> >> +{
> >> +unsigned int bits = paging_mode_hap(d) ? hap_paddr_bits : paddr_bits;
> >> +
> >> +if ( !IS_ENABLED(BIGMEM) && paging_mode_shadow(d) &&
> >> + (!is_pv_domain(d) || opt_allow_superpage) )
> >> +{
> >> +/* Shadowed superpages store GFNs in 32-bit page_info fields. */
> >> +bits = min(bits, 32U + PAGE_SHIFT);
> >> +}
> >> +
> >> +return bits;
> >> +}
> > Does this really need to be an inline function? With the overall goal
> > of not including every header everywhere, I particularly dislike the
> > need to include xen/kconfig.h here for things to build.
> 
> It is not on a critically path, but it seems wasteful to force something
> this small to be out of line.

It could be a macro, too.  FWIW I agree with you that a static inline
is best.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen/p2m: Fix p2m_flush_table for non-nested cases

2017-02-08 Thread Tim Deegan

At 17:22 + on 08 Feb (1486574546), George Dunlap wrote:
> Commit 71bb7304e7a7a35ea6df4b0cedebc35028e4c159 added flushing of
> nested p2m tables whenever the host p2m table changed.  Unfortunately
> in the process, it added a filter to p2m_flush_table() function so
> that the p2m would only be flushed if it was being used as a nested
> p2m.  This meant that the p2m was not being flushed at all for altp2m
> callers.
> 
> Only check np2m_base if p2m_class is set to p2m_nested.
> 
> NB that this is not a security issue: The only time this codepath is
> called is in cases where either nestedp2m or altp2m is enabled, and
> neither of them are in security support.
> 
> Reported-by: Matt Leinhos 
> Signed-off-by: George Dunlap 
> CC: Tamas K Lengyel 
> ---
>  xen/arch/x86/mm/p2m.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 6548e9f..0af2ec1 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1627,7 +1627,9 @@ p2m_flush_table(struct p2m_domain *p2m)
>  ASSERT(page_list_empty(&p2m->pod.super));
>  ASSERT(page_list_empty(&p2m->pod.single));
>  
> -if ( p2m->np2m_base == P2M_BASE_EADDR )
> +/* No need to flush if it's already empty */
> +if ( p2m->p2m_class == p2m_nested &&
> +     p2m->np2m_base == P2M_BASE_EADDR )

Looks like p2m_is_nestedp2m(p2m) is the usual idiom.  Either way:

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: Correct the behaviour of a guests view of maxphysaddr

2017-02-08 Thread Tim Deegan

At 15:29 + on 08 Feb (1486567798), Andrew Cooper wrote:
> On Intel, a guest can get an PTE shadowed which should have faulted and
> didn't (because the smfn is actually within 39 bits)

Yes.

, while on AMD, a
> guest can try to create a PTE which shouldn't fault (because CPUID says
> anything up to 48 bits is fine) yet does fault because the shadow code
> rejects anything above 44 bits.

But in the code you're replacing (just below) the CPUID value is
capped at the paging.gfn_bits+PAGE_SHIFT, so won't it report 44?

> >> -p->extd.maxphysaddr = min(p->extd.maxphysaddr, max->extd.maxphysaddr);
> >> -p->extd.maxphysaddr = min_t(uint8_t, p->extd.maxphysaddr,
> >> -d->arch.paging.gfn_bits + PAGE_SHIFT);
> >> -p->extd.maxphysaddr = max_t(uint8_t, p->extd.maxphysaddr,
> >> -(p->basic.pae || p->basic.pse36) ? 36 : 
> >> 32);
> >> +maxphysaddr_limit = paging_mode_hap(d) ? hap_paddr_bits : paddr_bits;
> >> +
> >> +if ( !IS_ENABLED(BIGMEM) && paging_mode_shadow(d) &&
> >> + (!is_pv_domain(d) || opt_allow_superpage) )
> >> +{
> >> +/* Shadowed superpages store GFNs in 32-bit page_info fields. */
> >> +maxphysaddr_limit = min(maxphysaddr_limit, 32U + PAGE_SHIFT);
> > I don't like this implementation detail escaping from x86/mm/shadow;
> > can we have a cpuid_limit_paddr_bits(d) that the shadow code can call
> > to tell cpuid about this?  Or perhaps a paging_max_paddr_bits(d) so
> > cpuid can ask paging when it needs to know?
> 
> I will see about doing this.
> 
> > In either case I wonder whether there needs to be some trigger of
> > recalculate_cpuid_policy() once shadow mode is enabled -- I guess
> > currently it relies on it being called by something late enough in hvm
> > domain creation?
> 
> Yes, although now I am worried about migrating PV guests.  I will double
> check to see whether that path works sensibly or not, because we can't
> have maxphysaddr suddenly shrink under the feet of a running guest.

We're lucky here: the shadow width restriction doesn't apply to PV
guests.

> >> +p->extd.maxphysaddr = min_t(uint8_t, p->extd.maxphysaddr, 
> >> maxphysaddr_limit);
> >> +p->extd.maxphysaddr = max_t(uint8_t, p->extd.maxphysaddr, 32);
> > No longer requiring 36 bits for pae/pse36?  This is mentioned in the
> > description but not explained.
> 
> 32/36 is only the default to be assumed if no maxphysaddr is provided. 
> We always provide maxphysaddr, and passing 32 on a massive machine is
> legitimate.

OK.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: Correct the behaviour of a guests view of maxphysaddr

2017-02-08 Thread Tim Deegan

Hi,

At 12:36 + on 08 Feb (1486557378), Andrew Cooper wrote:
> XSA-173 (c/s 8b1764833) introduces gfn_bits, and an upper limit which might be
> lower than the real maxphysaddr, to avoid overflowing the superpage shadow
> backpointer.
> 
> However, plenty of hardware has a physical address width less that 44 bits,
> and the code added in shadow_domain_init() is a straight asignment.  This
> causes gfn_bits to be increased beyond the physical address width on most
> Intel consumer hardware (which has a width of 39).

Once again, I think this is actually OK.  But cpuid and shadow not
agreeing on the limit is a bug and I'm happy for it to be resolved
this way.

> It also means that the effective maxphysaddr for shadowed guests differs from
> the value reported to the guest in CPUID.  This means that a guest can create
> PTEs which either should fault but don't, or shouldn't fault but do.

Can it really create a PTE that shouldn't fault but does?  AFAICS the
cpuid policy can report a lower value than 44 but not a higher one.

> Remove gfn_bits and rework its logic in terms of a guests maxphysaddr.
> recalculate_cpuid_policy() is updated to properly clamp maxphysaddr between
> the host maximum, a possibly-smaller shadow restriction, and 32 as a minimum.
> 
> Signed-off-by: Andrew Cooper 

> @@ -502,11 +504,17 @@ void recalculate_cpuid_policy(struct domain *d)
>  
>  cpuid_featureset_to_policy(fs, p);
>  
> -p->extd.maxphysaddr = min(p->extd.maxphysaddr, max->extd.maxphysaddr);
> -p->extd.maxphysaddr = min_t(uint8_t, p->extd.maxphysaddr,
> -d->arch.paging.gfn_bits + PAGE_SHIFT);
> -p->extd.maxphysaddr = max_t(uint8_t, p->extd.maxphysaddr,
> -(p->basic.pae || p->basic.pse36) ? 36 : 32);
> +maxphysaddr_limit = paging_mode_hap(d) ? hap_paddr_bits : paddr_bits;
> +
> +if ( !IS_ENABLED(BIGMEM) && paging_mode_shadow(d) &&
> + (!is_pv_domain(d) || opt_allow_superpage) )
> +{
> +/* Shadowed superpages store GFNs in 32-bit page_info fields. */
> +maxphysaddr_limit = min(maxphysaddr_limit, 32U + PAGE_SHIFT);

I don't like this implementation detail escaping from x86/mm/shadow;
can we have a cpuid_limit_paddr_bits(d) that the shadow code can call
to tell cpuid about this?  Or perhaps a paging_max_paddr_bits(d) so
cpuid can ask paging when it needs to know?

In either case I wonder whether there needs to be some trigger of
recalculate_cpuid_policy() once shadow mode is enabled -- I guess
currently it relies on it being called by something late enough in hvm
domain creation?

> +}
> +
> +p->extd.maxphysaddr = min_t(uint8_t, p->extd.maxphysaddr, 
> maxphysaddr_limit);
> +p->extd.maxphysaddr = max_t(uint8_t, p->extd.maxphysaddr, 32);

No longer requiring 36 bits for pae/pse36?  This is mentioned in the
description but not explained.

The rest of this looks fine to me.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/shadow: Move shadow pagetable fields into struct shadow_vcpu

2017-02-08 Thread Tim Deegan

At 17:26 + on 30 Jan (1485797162), Andrew Cooper wrote:
> The vTLB and last_write* booleans are used exclusively by the shadow pagetable
> code.  Move them from paging_vcpu to shadow_vcpu, which causes them to be
> entirely omitted on a build without shadow paging support.
> 
> While changing the qualified names of these variables, drop an unnessary NULL
> check before freeing the vTLB, and move allocation of the vTLB from
> sh_update_paging_modes() to shadow_vcpu_init() where it more logically
> belongs.

In future, can you please separate incidental cleanups from the
mechanical renaming parts?  It makes patch review much easier.

> +#if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
> +/* Allocate a virtual TLB for this vcpu. */
> +v->arch.paging.shadow.vtlb = xzalloc_array(struct shadow_vtlb, 
> VTLB_ENTRIES);
> +if ( !v->arch.paging.shadow.vtlb )
> +return -ENOMEM;
> +spin_lock_init(&v->arch.paging.shadow.vtlb_lock);
> +#endif /* (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB) */

As Jan points out, the higher-level error path can now leak this array.

Apart from that, this looks good to me.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/mm: Plumb a error return through {paging, hap, shadow}_vcpu_init()

2017-02-08 Thread Tim Deegan

At 17:26 + on 30 Jan (1485797161), Andrew Cooper wrote:
> No functional change yet, but required for subsequent changes.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/p2m: Remove np2m-specific filter from generic p2m_flush_table

2017-02-08 Thread Tim Deegan

At 03:44 -0700 on 31 Jan (1485834259), Jan Beulich wrote:
> >>> On 30.01.17 at 16:17,  wrote:
> > Commit 71bb7304e7a7a35ea6df4b0cedebc35028e4c159 added flushing of
> > nested p2m tables whenever the host p2m table changed.  Unfortunately
> > in the process, it added a filter to the generic p2m_flush_table()
> > function so that the p2m would only be flushed if it was being used as
> > a nested p2m.  This meant that the p2m was not being flushed at all
> > for altp2m callers.
> > 
> > Instead do the nested p2m filtering in p2m_flush_nestedp2m().

I think it would be better to fix the test in p2m_flush_table() so it
understands altp2m's use of tables.  That way we won't have to deal
with filtering at other call sites, as Jan points out.  Also, this:

> >  for ( i = 0; i < MAX_NESTEDP2M; i++ )
> > -p2m_flush_table(d->arch.nested_p2m[i]);
> > +{
> > +struct p2m_domain *p2m = d->arch.nested_p2m[i];
> > +if ( p2m->np2m_base != P2M_BASE_EADDR )
> > +p2m_flush_table(p2m);
> > +}

moves the check of np2m_base outside the lock.  That might be OK but
it's definitely a bit subtle.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/p2m: Stop other vcpus using a nested p2m before clearing it

2017-02-08 Thread Tim Deegan

At 18:48 + on 07 Feb (1486493293), Andrew Cooper wrote:
> Until the IPI has completed, other processors might be running on this nested
> p2m object.  clear_domain_page() does not guarantee to make 8-byte atomic
> updates, which means that a pagewalk on a remote processor might encounter a
> partial update.
> 
> This is currently safe as other issues prevents a nested p2m ever being shared
> between two cpus (although this is contrary to the original plan).
> 
> Setting p2m->np2m_base to P2M_BASE_EADDR before the IPI ensures that the IPI'd
> processors won't continue to use the flushed mappings.
> 
> While modifying this function, remove all the trailing whitespace and tweak
> style in the affected areas.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 4/9] xen/x86: populate PVHv2 Dom0 physical memory map

2017-01-27 Thread Tim Deegan

At 09:40 -0700 on 27 Jan (1485510008), Jan Beulich wrote:
> >>> On 27.01.17 at 14:20,  wrote:
> > At 12:51 + on 27 Jan (1485521470), Andrew Cooper wrote:
> >> On 27/01/17 11:14, Tim Deegan wrote:
> >> > But looking at it now, I'm not convinced of exactly how.  The magic
> >> > bitmap in the TSS is at [I/O Map Base Address] - 32, and the I/O map
> >> > base address itself lives at offset 100.  A zero'd TSS should mean an
> >> > I/O map at 0, and an interrupt redirection bitmap at -32, which would
> >> > plausibly work if the TSS were 256 bytes (matching the limit set in
> >> > Xen).  Perhaps it's only working because the 128 bytes following the
> >> > TSS in hvmloader happen to be zeros too?
> >> 
> >> With an IO_base_map of 0, the software interrupt bitmap will end up
> >> being ahead of the TSS, not after it.
> > 
> > I should have thought that the segmented address calculation would
> > wrap and leave us at TSS + 224.
> 
> I don't think wrapping takes the limit value into account.

Quite right, I'm talking nonsense.

> - vmx_set_segment_register() will need to set a correct limit

Yep.

> - vmx_set_segment_register() should initialize the TSS every
>   time (including setting the I/O bitmap address to no lower
>   than 32)

Probably to no lower than 136, to avoid having the bits of that field
itself appearing in either the IO or interrupt bitmap.

> - hvmloader's init_vm86_tss() will need to allocate 160 bytes
>   rather than 128 (and we should expose this number, so that
>   Roger can also use it)
> 
> Perhaps we should even introduce a hypercall for hvmloader
> to query the needed value, rather than exposing a hardcoded
> number?

I think Andrew's suggestion of just using a whole page is a good
one.  The TSS is a 32-bit one, after all, and doesn't need to live in
BIOS space.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 4/9] xen/x86: populate PVHv2 Dom0 physical memory map

2017-01-27 Thread Tim Deegan


> Despite being owned by the guest, this TSS is actually managed by
Xen.
> It should be initialised to defaults each time Xen needs to use it
on
> behalf of the guest.

At 14:35 + on 27 Jan (1485527708), Andrew Cooper wrote:
> On 27/01/17 14:01, Tim Deegan wrote:
> > Hi,
> >
> > At 13:46 + on 27 Jan (1485524765), Andrew Cooper wrote:
> >> The actual behaviour can be determined by putting the TSS on a page
> >> boundary, making the previous frame non-readable via EPT, and seeing
> >> whether an EPT violation occurs.
> > Indeed.  Or likewise with normal pagetables. 
> >
> >>> Yes, I wonder about the I/O bitmap too.  We don't provide one, or even
> >>> enough space for a full one, but the current SDM is pretty clear that
> >>> the CPU will try to check it in virtual 8086 mode.
> >>>
> >>> It may be that all the ports actually used happen to fall in the 128
> >>> bytes of zeros that we provide.
> >> With an offset of 0, we actually provide 256 bytes of zeros in the
> >> bitmap within the TSS limit.
> > Sure, or at least 128 bytes of zeros and another 128 bytes of something.
> 
> That is a good point.  Nothing prevents a guest exiting vm86 mode, and
> using a task switch to move to a new tss, which will cause Xen to write
> state back into the vm86_tss, making it no longer a zeroed block of memory.
> 
> Despite being owned by the guest, this TSS is actually managed by Xen. 
> It should be initialised to defaults each time Xen needs to use it on
> behalf of the guest.

But it's already in an E820 reserved block - if the guest overwrites
it (with a task switch or otherwise) it will break real-mode support,
but this is no worse than nobbling any other part of the BIOS state.

If we're making it non-zero, I can see an argument for having Xen init
the contents once (maybe when the HVM param is written?) so that it
matches what Xen expects of it.  But resetting it every time we use it
would be overkill.

> >> We set IOPL to 3 as well as when entering vm86 to fake up real mode. 
> >> This bypasses all I/O bitmap checks (a properly common to ring 3
> >> protected tasks as well - See specifically 20.2.7 "Sensitive
> >> Instructions"), which means the IN/OUT instructions end up directly at
> >> the relevant vmexit case.
> > 20.2.8.1 makes it clear that this is not the case -- in virtual 8086
> > mode all IN/OUT ops check the bitmap event with IOPL == CPL.
> 
> Hmm.  Right you area, which explains why the TSS limit is greater than
> 0x67. 
> 
> If the emulation code were working correctly, the emulator should come
> to the same conclusion as hardware and inject a #GP fault.

I don't think so -- the emulator is emulating actual real-mode, not
virtual 8086 mode, so it shouldn't fault on any IO port accesses.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

1 2 3 4 5 6 7 >

1 - 100 of 612 matches

Mail list logo