Re: [Xen-devel] [PATCH v2 0/6] xen/x86: various XPTI speedups

2018-03-05 Thread Dario Faggioli
On Fri, 2018-03-02 at 09:13 +0100, Juergen Gross wrote:
> The complete series has been verified to still mitigate against
> Meltdown attacks. A simple performance test (make -j 4 in the Xen
> hypervisor directory) showed significant improvements compared to the
> state without this series (so with Jan's and Wei's series applied),
> the percentage after the numbers is always related to XPTI off:
> 
>XPTI off Jan+Wei, XPTI on+this series, XPTI on
> real   1m21.169s1m52.149s (+38%)1m25.692s (+6%)
> user   2m47.652s2m50.054s (+1%) 2m46.428s (-1%)
> sys1m11.949s2m21.767s (+97%)1m23.053s (+15%)
> 
> A git branch of that series (+ Jan's and Wei's patches) is available:
> 
> https://github.com/jgross1/xen.git xpti
> 
I've run some more benchmarks, and here there are the results:

https://openbenchmarking.org/result/1803039-DARI-180303217
http://openbenchmarking.org/result/1803039-DARI-180303217&obr_nor=y&obr_hgv=Jan%2BWei%2C+XPTI+on

(I also include a textual recap at the bottom of this email.)

These numbers shows that Juergen's series is quite effective at
improving performance in pretty much all workloads that I've tested.

The only exception is schbench, but I don't think that's very relevant,
because of how the benchmark is configured in the PhoronixTestSuite (I
just recently discovered that).

The in-guest context-switching heavy workloads are the ones where this
series makes the most (positive) difference.

Note that on Stream and on Stress-ng:MemoryCopy, XPTI=on+this series
does even *better* than XPTI=off. This is most likely due to the fact
that Juergen, for now, takes advantage of PCID only for the XPTI=on
case. However, although it is indeed a bit of an unfair comparison, I
think it does prove the point that we want to have (something like)
this series.

Regards,
Dario

AIO-Stress 0.21
Test: Random Write
MB/s > Higher Is Better
XPTI off ... 1926.57 
|=
+this series, XPTI on .. 1931.44 
|=
Jan+Wei, XPTI on ... 1807.30 
|==

Stream 2013-01-17
Type: Copy
MB/s > Higher Is Better
XPTI off ... 15738.48 
|==
+this series, XPTI on .. 19011.66 
|
Jan+Wei, XPTI on ... 15381.94 
|===

Stream 2013-01-17
Type: Scale
MB/s > Higher Is Better
XPTI off ... 10849.14 
|=
+this series, XPTI on .. 12833.84 
|
Jan+Wei, XPTI on ... 10696.66 
|===

Stream 2013-01-17
Type: Triad
MB/s > Higher Is Better
XPTI off ... 12268.20 
|==
+this series, XPTI on .. 14085.56 
|
Jan+Wei, XPTI on ... 12120.56 
|

Stream 2013-01-17
Type: Add
MB/s > Higher Is Better
XPTI off ... 12323.60 
|=
+this series, XPTI on .. 15881.14 
|
Jan+Wei, XPTI on ... 12085.76 
|===

[Xen-devel] [PATCH v2 0/6] xen/x86: various XPTI speedups

2018-03-02 Thread Juergen Gross
This patch series aims at reducing the overhead of the XPTI Meltdown
mitigation. It is based on Jan's XPTI speedup series and Wei's series
for support of PCID and INVPCID.

Patch 1 had been posted before, the main changes in this patch are due
to addressing Jan's comments on my first version. The main objective of
that patch is to avoid copying the L4 page table each time the guest is
being activated, as often the contents didn't change while the
hypervisor was active.

Patch 2 tries to minimize flushing the TLB: there is no need to flush
it in write_ptbase() and when activating the guest.

Patch 3 sets the stage for being able to activate XPTI per domain. As a
first step it is now possible to switch XPTI off for dom0 via the xpti
boot parameter.

Patch 4 reduces the costs of TLB flushes even further: as we don't make
any use of global TLB entries with XPTI being active we can avoid
removing all global TLB entries on TLB flushes by simply deactivating
the global pages in CR4.

Patch 5 was originally only meant to prepare using PCIDs in patch 6.
For that purpose it was necessary to allow CR3 values with bit 63 set
in order to avoid flushing TLB entries when writing CR3. This requires
a modification of Jan's rather clever state machine with positive and
negative CR3 values for the hypervisor by using a dedicated flag byte
instead. It turned out this modification saved one branch on interrupt
entry speeding up the handling by a few percent.

Patch 6 is the main performance contributor: by making use of the PCID
feature (if available) TLB entries can survive CR3 switches. The TLB
needs to be flushed on context switches only and not when switching
between guest and hypervisor or guest kernel and user mode.

The complete series has been verified to still mitigate against
Meltdown attacks. A simple performance test (make -j 4 in the Xen
hypervisor directory) showed significant improvements compared to the
state without this series (so with Jan's and Wei's series applied),
the percentage after the numbers is always related to XPTI off:

   XPTI off Jan+Wei, XPTI on+this series, XPTI on
real   1m21.169s1m52.149s (+38%)1m25.692s (+6%)
user   2m47.652s2m50.054s (+1%) 2m46.428s (-1%)
sys1m11.949s2m21.767s (+97%)1m23.053s (+15%)

A git branch of that series (+ Jan's and Wei's patches) is available:

https://github.com/jgross1/xen.git xpti


Juergen Gross (6):
  x86/xpti: avoid copying L4 page table contents when possible
  x86/xpti: don't flush TLB twice when switching to 64-bit pv context
  xen/x86: support per-domain flag for xpti
  xen/x86: disable global pages for domains with XPTI active
  xen/x86: use flag byte for decision whether xen_cr3 is valid
  xen/x86: use PCID feature for XPTI

 docs/misc/xen-command-line.markdown |  8 +++-
 xen/arch/x86/cpu/mtrr/generic.c | 37 ++-
 xen/arch/x86/domain.c   |  1 +
 xen/arch/x86/domain_page.c  |  2 +-
 xen/arch/x86/domctl.c   |  4 ++
 xen/arch/x86/flushtlb.c | 85 ++-
 xen/arch/x86/mm.c   | 57 +--
 xen/arch/x86/pv/dom0_build.c|  4 ++
 xen/arch/x86/pv/domain.c| 90 -
 xen/arch/x86/setup.c| 23 +++---
 xen/arch/x86/smp.c  |  2 +
 xen/arch/x86/smpboot.c  |  6 ++-
 xen/arch/x86/x86_64/asm-offsets.c   |  2 +
 xen/arch/x86/x86_64/compat/entry.S  |  5 +--
 xen/arch/x86/x86_64/entry.S | 79 ++--
 xen/include/asm-x86/current.h   | 22 ++---
 xen/include/asm-x86/domain.h| 38 +++-
 xen/include/asm-x86/flushtlb.h  |  2 +
 xen/include/asm-x86/pv/domain.h |  4 ++
 xen/include/asm-x86/x86-defns.h |  1 +
 20 files changed, 327 insertions(+), 145 deletions(-)

-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel