Re: [Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-08 Thread Juergen Gross
On 08/03/18 13:47, Jan Beulich wrote:
 On 08.03.18 at 12:59,  wrote:
>> On 05/03/18 17:43, Jan Beulich wrote:
>> On 02.03.18 at 09:13,  wrote:
 @@ -3704,18 +3706,22 @@ long do_mmu_update(
  break;
  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
cmd == MMU_PT_UPDATE_PRESERVE_AD, 
 v);
 -/*
 - * No need to sync if all uses of the page can be 
 accounted
 - * to the page lock we hold, its pinned status, and 
 uses on
 - * this (v)CPU.
 - */
 -if ( !rc && !cpu_has_no_xpti &&
 - ((page->u.inuse.type_info & PGT_count_mask) >
 -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
 -   (pagetable_get_pfn(curr->arch.guest_table) == 
 mfn) 
 +
 -   
 (pagetable_get_pfn(curr->arch.guest_table_user) ==
 -mfn))) )
>>
>> Is it really possible this code is running with the user page table
>> being active on the current cpu? I think this test can be dropped.
> 
> I'm not sure I understand: The check above isn't for what is
> active on a CPU, but for what references are being held.
> _Installing_ a root page table into ->arch.guest_table{,_user}
> is when a reference is being obtained, not _loading_ the table
> into CR3. (In theory the above could be extended to also
> check vCPU-s other than current, but one would need to deal
> with races; obviously pausing the other vCPU-s of the domain
> wouldn't be a good idea, but that would be one possible way.)

Aah, sorry. Now I understand.

Thanks,


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-08 Thread Jan Beulich
>>> On 08.03.18 at 12:59,  wrote:
> On 05/03/18 17:43, Jan Beulich wrote:
> On 02.03.18 at 09:13,  wrote:
>>> @@ -3704,18 +3706,22 @@ long do_mmu_update(
>>>  break;
>>>  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>>>cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>>> -/*
>>> - * No need to sync if all uses of the page can be 
>>> accounted
>>> - * to the page lock we hold, its pinned status, and 
>>> uses on
>>> - * this (v)CPU.
>>> - */
>>> -if ( !rc && !cpu_has_no_xpti &&
>>> - ((page->u.inuse.type_info & PGT_count_mask) >
>>> -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
>>> -   (pagetable_get_pfn(curr->arch.guest_table) == 
>>> mfn) 
>>> +
>>> -   (pagetable_get_pfn(curr->arch.guest_table_user) 
>>> ==
>>> -mfn))) )
> 
> Is it really possible this code is running with the user page table
> being active on the current cpu? I think this test can be dropped.

I'm not sure I understand: The check above isn't for what is
active on a CPU, but for what references are being held.
_Installing_ a root page table into ->arch.guest_table{,_user}
is when a reference is being obtained, not _loading_ the table
into CR3. (In theory the above could be extended to also
check vCPU-s other than current, but one would need to deal
with races; obviously pausing the other vCPU-s of the domain
wouldn't be a good idea, but that would be one possible way.)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-08 Thread Juergen Gross
On 05/03/18 17:43, Jan Beulich wrote:
 On 02.03.18 at 09:13,  wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -509,6 +509,8 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
>>  
>>  void write_ptbase(struct vcpu *v)
>>  {
>> +get_cpu_info()->root_pgt_changed = this_cpu(root_pgt) && is_pv_vcpu(v) 
>> &&
>> +   !is_pv_32bit_vcpu(v);
> 
> Why is_pv_vcpu() when you already check is_pv_32bit_vcpu()?
> 
>> @@ -3704,18 +3706,22 @@ long do_mmu_update(
>>  break;
>>  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>>cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>> -/*
>> - * No need to sync if all uses of the page can be 
>> accounted
>> - * to the page lock we hold, its pinned status, and 
>> uses on
>> - * this (v)CPU.
>> - */
>> -if ( !rc && !cpu_has_no_xpti &&
>> - ((page->u.inuse.type_info & PGT_count_mask) >
>> -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
>> -   (pagetable_get_pfn(curr->arch.guest_table) == 
>> mfn) 
>> +
>> -   (pagetable_get_pfn(curr->arch.guest_table_user) 
>> ==
>> -mfn))) )

Is it really possible this code is running with the user page table
being active on the current cpu? I think this test can be dropped.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-05 Thread Jan Beulich
>>> On 06.03.18 at 08:01,  wrote:
> On 05/03/18 17:43, Jan Beulich wrote:
> On 02.03.18 at 09:13,  wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -509,6 +509,8 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
>>>  
>>>  void write_ptbase(struct vcpu *v)
>>>  {
>>> +get_cpu_info()->root_pgt_changed = this_cpu(root_pgt) && is_pv_vcpu(v) 
>>> &&
>>> +   !is_pv_32bit_vcpu(v);
>> 
>> Why is_pv_vcpu() when you already check is_pv_32bit_vcpu()?
> 
> I check !is_pv_32bit_vcpu() to catch 64-bit pv-domains only.

Oh, I'm sorry - For whatever reason I've ignored the ! there.

>> And don't you need to disallow updating L4s of running guests now
>> (which is a bad idea anyway)?
> 
> Yes, I should do that.

But please do this as a separate change, as strictly speaking this is
a behavioral change that we can't allow. But I think we simply need
to accept this (and perhaps uniformly for all page table levels).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-05 Thread Jan Beulich
>>> On 02.03.18 at 09:13,  wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -509,6 +509,8 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
>  
>  void write_ptbase(struct vcpu *v)
>  {
> +get_cpu_info()->root_pgt_changed = this_cpu(root_pgt) && is_pv_vcpu(v) &&
> +   !is_pv_32bit_vcpu(v);

Why is_pv_vcpu() when you already check is_pv_32bit_vcpu()?

> @@ -3704,18 +3706,22 @@ long do_mmu_update(
>  break;
>  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
> -/*
> - * No need to sync if all uses of the page can be 
> accounted
> - * to the page lock we hold, its pinned status, and uses 
> on
> - * this (v)CPU.
> - */
> -if ( !rc && !cpu_has_no_xpti &&
> - ((page->u.inuse.type_info & PGT_count_mask) >
> -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
> -   (pagetable_get_pfn(curr->arch.guest_table) == 
> mfn) 
> +
> -   (pagetable_get_pfn(curr->arch.guest_table_user) ==
> -mfn))) )
> -sync_guest = true;
> +if ( !rc && !cpu_has_no_xpti )
> +{
> +get_cpu_info()->root_pgt_changed = true;

Why would you set this when a foreign domain's L4 got updated?
And don't you need to disallow updating L4s of running guests now
(which is a bad idea anyway)?

> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -207,6 +207,8 @@ void invalidate_interrupt(struct cpu_user_regs *regs)
>  unsigned int flags = flush_flags;
>  ack_APIC_irq();
>  perfc_incr(ipis);
> +if ( flags & FLUSH_ROOT_PGTBL )
> +get_cpu_info()->root_pgt_changed = true;

While for the caller in do_mmu_update() you don't need it, for
full correctness you also want to do this in flush_area_mask()
for the sender, if it's part of the CPU mask.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/6] x86/xpti: avoid copying L4 page table contents when possible

2018-03-02 Thread Juergen Gross
For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag whether the copying should be performed and set
that flag only when loading a new %cr3 or modifying the L4 page table.
This includes synchronization of the cpu local root page table with
other cpus, so add a special synchronization flag for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross 
---
To be applied on top of Jan's "Meltdown band-aid overhead reduction"
series
---
 xen/arch/x86/mm.c | 32 +++-
 xen/arch/x86/pv/domain.c  |  1 +
 xen/arch/x86/smp.c|  2 ++
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 xen/arch/x86/x86_64/entry.S   |  8 ++--
 xen/include/asm-x86/current.h |  8 
 xen/include/asm-x86/flushtlb.h|  2 ++
 7 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d8d3ee2ecd..fdc1636817 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -509,6 +509,8 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
+get_cpu_info()->root_pgt_changed = this_cpu(root_pgt) && is_pv_vcpu(v) &&
+   !is_pv_32bit_vcpu(v);
 write_cr3(v->arch.cr3);
 }
 
@@ -3704,18 +3706,22 @@ long do_mmu_update(
 break;
 rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
   cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-/*
- * No need to sync if all uses of the page can be accounted
- * to the page lock we hold, its pinned status, and uses on
- * this (v)CPU.
- */
-if ( !rc && !cpu_has_no_xpti &&
- ((page->u.inuse.type_info & PGT_count_mask) >
-  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
-   (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
-   (pagetable_get_pfn(curr->arch.guest_table_user) ==
-mfn))) )
-sync_guest = true;
+if ( !rc && !cpu_has_no_xpti )
+{
+get_cpu_info()->root_pgt_changed = true;
+/*
+ * No need to sync if all uses of the page can be
+ * accounted to the page lock we hold, its pinned
+ * status, and uses on this (v)CPU.
+ */
+if ( (page->u.inuse.type_info & PGT_count_mask) >
+ (1 + !!(page->u.inuse.type_info & PGT_pinned) +
+  (pagetable_get_pfn(curr->arch.guest_table) ==
+   mfn) +
+  (pagetable_get_pfn(curr->arch.guest_table_user) 
==
+   mfn)) )
+sync_guest = true;
+}
 break;
 
 case PGT_writable_page:
@@ -3830,7 +3836,7 @@ long do_mmu_update(
 
 cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
 if ( !cpumask_empty(mask) )
-flush_mask(mask, FLUSH_TLB_GLOBAL);
+flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
 }
 
 perfc_add(num_page_updates, i);
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 868a23fd7e..7742d522f5 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -238,6 +238,7 @@ static void _toggle_guest_pt(struct vcpu *v, bool force_cr3)
 
 /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
 asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
+get_cpu_info()->root_pgt_changed = true;
 
 if ( !(v->arch.flags & TF_kernel_mode) )
 return;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 033dd05958..60b0657ab7 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -207,6 +207,8 @@ void invalidate_interrupt(struct cpu_user_regs *regs)
 unsigned int flags = flush_flags;
 ack_APIC_irq();
 perfc_incr(ipis);
+if ( flags & FLUSH_ROOT_PGTBL )
+get_cpu_info()->root_pgt_changed = true;
 if ( (flags & FLUSH_VCPU_STATE) && __sync_local_execstate() )
 flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL);
 if ( flags &