commit xen for openSUSE:Factory

root Tue, 20 Mar 2018 13:51:05 -0700

Hello community,

here is the log from the commit of package xen for openSUSE:Factory checked in 
at 2018-03-20 21:50:37
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/xen (Old)
 and      /work/SRC/openSUSE:Factory/.xen.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "xen"

Tue Mar 20 21:50:37 2018 rev:244 rq:586076 version:4.10.0_14

Changes:
--------
--- /work/SRC/openSUSE:Factory/xen/xen.changes  2018-03-01 12:02:21.481832679 
+0100
+++ /work/SRC/openSUSE:Factory/.xen.new/xen.changes     2018-03-20 
21:50:48.542316318 +0100
@@ -1,0 +2,24 @@
+Thu Mar  1 09:36:03 MST 2018 - [email protected]
+
+- bsc#1072834 - Xen HVM: unchecked MSR access error: RDMSR from
+  0xc90 at rIP: 0xffffffff93061456 (native_read_msr+0x6/0x30)
+  5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
+- Upstream patches from Jan (bsc#1027519)
+  5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch
+  5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch
+  5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
+  5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch
+  5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch
+  5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
+  5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
+  5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch (Replaces 
xsa252.patch)
+  5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch (Replaces 
xsa255-1.patch)
+  5a955800-gnttab-dont-free-status-pages-on-ver-change.patch (Replaces 
xsa255-2.patch)
+  5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch (Replaces 
xsa256.patch)
+- Drop
+  xsa252.patch
+  xsa255-1.patch
+  xsa255-2.patch
+  xsa256.patch
+
+-------------------------------------------------------------------
@@ -4,2 +28,2 @@
-- bsc#1080635 - VUL-0: xen: DoS via non-preemptable L3/L4 pagetable
-  freeing (XSA-252)
+- bsc#1080635 - VUL-0: CVE-2018-7540: xen: DoS via non-preemptable
+  L3/L4 pagetable freeing (XSA-252)
@@ -7,2 +31,2 @@
-- bsc#1080662 - VUL-0: xen: grant table v2 -> v1 transition may
-  crash Xen (XSA-255)
+- bsc#1080662 - VUL-0: CVE-2018-7541: xen: grant table v2 -> v1
+  transition may crash Xen (XSA-255)
@@ -11,2 +35,2 @@
-- bsc#1080634 - VUL-0: xen: x86 PVH guest without LAPIC may DoS the
-  host (XSA-256)
+- bsc#1080634 - VUL-0: CVE-2018-7542: xen: x86 PVH guest without
+  LAPIC may DoS the host (XSA-256)
@@ -56,2 +80,3 @@
-- bsc#1074562 - VUL-0: xen: Information leak via side effects of
-  speculative execution (XSA-254). Includes Spectre v2 mitigation.
+- bsc#1074562 - VUL-0: CVE-2017-5753,CVE-2017-5715,CVE-2017-5754
+  xen: Information leak via side effects of speculative execution
+  (XSA-254). Includes Spectre v2 mitigation.

Old:
----
  xsa252.patch
  xsa255-1.patch
  xsa255-2.patch
  xsa256.patch

New:
----
  5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch
  5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch
  5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
  5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch
  5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch
  5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
  5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
  5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
  5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
  5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
  5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
  5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ xen.spec ++++++
--- /var/tmp/diff_new_pack.FoVs26/_old  2018-03-20 21:50:50.858232921 +0100
+++ /var/tmp/diff_new_pack.FoVs26/_new  2018-03-20 21:50:50.862232777 +0100
@@ -126,7 +126,7 @@
 BuildRequires:  pesign-obs-integration
 %endif
 
-Version:        4.10.0_13
+Version:        4.10.0_14
 Release:        0
 Summary:        Xen Virtualization: Hypervisor (aka VMM aka Microkernel)
 License:        GPL-2.0
@@ -206,10 +206,18 @@
 Patch43:        5a6b36cd-9-x86-issue-speculation-barrier.patch
 Patch44:        5a6b36cd-A-x86-offer-Indirect-Branch-Controls-to-guests.patch
 Patch45:        5a6b36cd-B-x86-clear-SPEC_CTRL-while-idle.patch
-Patch252:       xsa252.patch
-Patch25501:     xsa255-1.patch
-Patch25502:     xsa255-2.patch
-Patch256:       xsa256.patch
+Patch46:        5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch
+Patch47:        5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch
+Patch48:        
5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
+Patch49:        5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch
+Patch50:        5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch
+Patch51:        5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
+Patch52:        5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
+Patch53:        5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
+Patch54:        5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
+Patch55:        5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
+Patch56:        5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
+Patch57:        
5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
 # Our platform specific patches
 Patch400:       xen-destdir.patch
 Patch401:       vif-bridge-no-iptables.patch
@@ -445,10 +453,18 @@
 %patch43 -p1
 %patch44 -p1
 %patch45 -p1
-%patch252 -p1
-%patch25501 -p1
-%patch25502 -p1
-%patch256 -p1
+%patch46 -p1
+%patch47 -p1
+%patch48 -p1
+%patch49 -p1
+%patch50 -p1
+%patch51 -p1
+%patch52 -p1
+%patch53 -p1
+%patch54 -p1
+%patch55 -p1
+%patch56 -p1
+%patch57 -p1
 # Our platform specific patches
 %patch400 -p1
 %patch401 -p1

++++++ 5a5e3a4e-7-x86-cmdline-opt-to-disable-IBRS-IBPB-STIBP.patch ++++++
--- /var/tmp/diff_new_pack.FoVs26/_old  2018-03-20 21:50:50.990228168 +0100
+++ /var/tmp/diff_new_pack.FoVs26/_new  2018-03-20 21:50:50.990228168 +0100
@@ -13,8 +13,23 @@
 Signed-off-by: Andrew Cooper <[email protected]>
 Reviewed-by: Jan Beulich <[email protected]>
 
---- a/docs/misc/xen-command-line.markdown
-+++ b/docs/misc/xen-command-line.markdown
+# Commit ac37ec1ddef234eeba6f438c29ff687c64962ebd
+# Date 2018-01-31 10:47:12 +0000
+# Author Andrew Cooper <[email protected]>
+# Committer Andrew Cooper <[email protected]>
+xen/cmdline: Fix parse_boolean() for unadorned values
+
+A command line such as "cpuid=no-ibrsb,no-stibp" tickles a bug in
+parse_boolean() because the separating comma fails the NUL case.
+
+Instead, check for slen == nlen which accounts for the boundary (if any)
+passed via the 'e' parameter.
+
+Signed-off-by: Andrew Cooper <[email protected]>
+Reviewed-by: Jan Beulich <[email protected]>
+
+--- trunk.orig/docs/misc/xen-command-line.markdown     2018-02-01 
11:40:54.706665840 +0100
++++ trunk/docs/misc/xen-command-line.markdown  2018-02-01 00:00:00.000000000 
+0100
 @@ -471,6 +471,18 @@ choice of `dom0-kernel` is deprecated an
    respectively.
  * `verbose` option can be included as a string or also as `verbose=<integer>`
@@ -34,8 +49,8 @@
  ### cpuid\_mask\_cpu (AMD only)
  > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g 
| fam_10_rev_b | fam_10_rev_c | fam_11_rev_b`
  
---- a/xen/arch/x86/cpuid.c
-+++ b/xen/arch/x86/cpuid.c
+--- trunk.orig/xen/arch/x86/cpuid.c    2018-02-01 11:40:54.706665840 +0100
++++ trunk/xen/arch/x86/cpuid.c 2018-02-01 00:00:00.000000000 +0100
 @@ -18,6 +18,41 @@ static const uint32_t hvm_shadow_feature
  static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
  static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
@@ -78,9 +93,9 @@
  #define EMPTY_LEAF ((struct cpuid_leaf){})
  static void zero_leaves(struct cpuid_leaf *l,
                          unsigned int first, unsigned int last)
---- a/xen/common/kernel.c
-+++ b/xen/common/kernel.c
-@@ -244,6 +244,29 @@ int parse_bool(const char *s, const char
+--- trunk.orig/xen/common/kernel.c     2018-02-01 11:40:54.706665840 +0100
++++ trunk/xen/common/kernel.c  2018-02-01 11:40:25.000000000 +0100
+@@ -244,6 +244,33 @@ int parse_bool(const char *s, const char
      return -1;
  }
  
@@ -99,19 +114,23 @@
 +    if ( slen < nlen || strncmp(s, name, nlen) )
 +        return -1;
 +
-+    switch ( s[nlen] )
-+    {
-+    case '\0': return val;
-+    case '=':  return parse_bool(&s[nlen + 1], e);
-+    default:   return -1;
-+    }
++    /* Exact, unadorned name?  Result depends on the 'no-' prefix. */
++    if ( slen == nlen )
++        return val;
++
++    /* =$SOMETHING?  Defer to the regular boolean parsing. */
++    if ( s[nlen] == '=' )
++        return parse_bool(&s[nlen + 1], e);
++
++    /* Unrecognised.  Give up. */
++    return -1;
 +}
 +
  unsigned int tainted;
  
  /**
---- a/xen/include/xen/lib.h
-+++ b/xen/include/xen/lib.h
+--- trunk.orig/xen/include/xen/lib.h   2018-02-01 11:40:54.706665840 +0100
++++ trunk/xen/include/xen/lib.h        2018-02-01 00:00:00.000000000 +0100
 @@ -74,6 +74,13 @@ void cmdline_parse(const char *cmdline);
  int runtime_parse(const char *line);
  int parse_bool(const char *s, const char *e);

++++++ 5a6b36cd-6-x86-clobber-RSB-RAS-on-entry.patch ++++++
--- /var/tmp/diff_new_pack.FoVs26/_old  2018-03-20 21:50:51.038226440 +0100
+++ /var/tmp/diff_new_pack.FoVs26/_new  2018-03-20 21:50:51.038226440 +0100
@@ -29,7 +29,7 @@
  XEN_CPUFEATURE(XEN_IBRS_SET,    (FSCAPINTS+0)*32+16) /* IBRSB && IRBS set in 
Xen */
  XEN_CPUFEATURE(XEN_IBRS_CLEAR,  (FSCAPINTS+0)*32+17) /* IBRSB && IBRS clear 
in Xen */
 +XEN_CPUFEATURE(RSB_NATIVE,      (FSCAPINTS+0)*32+18) /* RSB overwrite needed 
for native */
-+XEN_CPUFEATURE(RSB_VMEXIT,      (FSCAPINTS+0)*32+20) /* RSB overwrite needed 
for vmexit */
++XEN_CPUFEATURE(RSB_VMEXIT,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed 
for vmexit */
 --- a/xen/include/asm-x86/nops.h
 +++ b/xen/include/asm-x86/nops.h
 @@ -66,6 +66,7 @@

++++++ 5a6b36cd-8-x86-boot-calculate-best-BTI-mitigation.patch ++++++
--- /var/tmp/diff_new_pack.FoVs26/_old  2018-03-20 21:50:51.054225864 +0100
+++ /var/tmp/diff_new_pack.FoVs26/_new  2018-03-20 21:50:51.054225864 +0100
@@ -16,6 +16,37 @@
 Signed-off-by: Andrew Cooper <[email protected]>
 Reviewed-by: Jan Beulich <[email protected]>
 
+# Commit 30cbd0c83ef3d0edac2d5bcc41a9a2b7a843ae58
+# Date 2018-02-06 18:32:58 +0000
+# Author Andrew Cooper <[email protected]>
+# Committer Andrew Cooper <[email protected]>
+x86/spec_ctrl: Fix determination of when to use IBRS
+
+The original version of this logic was:
+
+    /*
+     * On Intel hardware, we'd like to use retpoline in preference to
+     * IBRS, but only if it is safe on this hardware.
+     */
+    else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+    {
+        if ( retpoline_safe() )
+            thunk = THUNK_RETPOLINE;
+        else
+            ibrs = true;
+    }
+
+but it was changed by a request during review.  Sadly, the result is buggy as
+it breaks the later fallback logic by allowing IBRS to appear as available
+when in fact it isn't.
+
+This in practice means that on repoline-unsafe hardware without IBRS, we
+select THUNK_JUMP despite intending to select THUNK_RETPOLINE.
+
+Reported-by: Zhenzhong Duan <[email protected]>
+Signed-off-by: Andrew Cooper <[email protected]>
+Reviewed-by: Jan Beulich <[email protected]>
+
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -246,7 +246,7 @@ enough. Setting this to a high value may
@@ -180,7 +211,7 @@
 +             */
 +            else if ( retpoline_safe() )
 +                thunk = THUNK_RETPOLINE;
-+            else
++            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
 +                ibrs = true;
          }
 +        /* Without compiler thunk support, use IBRS if available. */

++++++ 5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch ++++++
References: bsc#1055047

# Commit d0115f96ea633fd6d668f2c067785912c0ad4c00
# Date 2018-02-06 17:29:33 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
libxc: don't fail domain creation when unpacking initrd fails

At least Linux kernels have been able to work with gzip-ed initrd for
quite some time; initrd compressed with other methods aren't even being
attempted to unpack. Furthermore the unzip-ing routine used here isn't
capable of dealing with various forms of concatenated files, each of
which was gzip-ed separately (it is this particular case which has been
the source of observed VM creation failures).

Hence, if unpacking fails, simply hand the compressed blob to the guest
as is.

Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Wei Liu <[email protected]>

--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -291,7 +291,6 @@ int xc_dom_mem_init(struct xc_dom_image
 int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz);
 int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz);
 
-int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz);
 int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz);
 
 int xc_dom_devicetree_max_size(struct xc_dom_image *dom, size_t sz);
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -314,22 +314,6 @@ int xc_dom_kernel_check_size(struct xc_d
     return 0;
 }
 
-int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz)
-{
-    /* No limit */
-    if ( !dom->max_ramdisk_size )
-        return 0;
-
-    if ( sz > dom->max_ramdisk_size )
-    {
-        xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
-                     "ramdisk image too large");
-        return 1;
-    }
-
-    return 0;
-}
-
 /* ------------------------------------------------------------------------ */
 /* read files, copy memory blocks, with transparent gunzip                  */
 
@@ -996,16 +980,27 @@ static int xc_dom_build_ramdisk(struct x
     void *ramdiskmap;
 
     if ( !dom->ramdisk_seg.vstart )
-    {
         unziplen = xc_dom_check_gzip(dom->xch,
                                      dom->ramdisk_blob, dom->ramdisk_size);
-        if ( xc_dom_ramdisk_check_size(dom, unziplen) != 0 )
-            unziplen = 0;
-    }
     else
         unziplen = 0;
 
-    ramdisklen = unziplen ? unziplen : dom->ramdisk_size;
+    ramdisklen = max(unziplen, dom->ramdisk_size);
+    if ( dom->max_ramdisk_size )
+    {
+        if ( unziplen && ramdisklen > dom->max_ramdisk_size )
+        {
+            ramdisklen = min(unziplen, dom->ramdisk_size);
+            if ( unziplen > ramdisklen )
+                unziplen = 0;
+        }
+        if ( ramdisklen > dom->max_ramdisk_size )
+        {
+            xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
+                         "ramdisk image too large");
+            goto err;
+        }
+    }
 
     if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk",
                               dom->ramdisk_seg.vstart, ramdisklen) != 0 )
@@ -1020,11 +1015,18 @@ static int xc_dom_build_ramdisk(struct x
     if ( unziplen )
     {
         if ( xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size,
-                              ramdiskmap, ramdisklen) == -1 )
+                              ramdiskmap, unziplen) != -1 )
+            return 0;
+        if ( dom->ramdisk_size > ramdisklen )
             goto err;
     }
-    else
-        memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size);
+
+    /* Fall back to handing over the raw blob. */
+    memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size);
+    /* If an unzip attempt was made, the buffer may no longer be all zero. */
+    if ( unziplen > dom->ramdisk_size )
+        memset(ramdiskmap + dom->ramdisk_size, 0,
+               unziplen - dom->ramdisk_size);
 
     return 0;
 
++++++ 5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch ++++++
# Commit a22320e32dca0918ed23799583f470afe4c24330
# Date 2018-02-07 16:31:41 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
x86: reduce Meltdown band-aid IPI overhead

In case we can detect single-threaded guest processes (by checking
whether we can account for all root page table uses locally on the vCPU
that's running), there's no point in issuing a sync IPI upon an L4 entry
update, as no other vCPU of the guest will have that page table loaded.

Signed-off-by: Jan Beulich <[email protected]>
Acked-by: George Dunlap <[email protected]>
Acked-by: Andrew Cooper <[email protected]>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3664,8 +3664,18 @@ long do_mmu_update(
                 case PGT_l4_page_table:
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    if ( !rc )
-                        sync_guest = this_cpu(root_pgt);
+                    /*
+                     * No need to sync if all uses of the page can be accounted
+                     * to the page lock we hold, its pinned status, and uses on
+                     * this (v)CPU.
+                     */
+                    if ( !rc && this_cpu(root_pgt) &&
+                         ((page->u.inuse.type_info & PGT_count_mask) >
+                          (1 + !!(page->u.inuse.type_info & PGT_pinned) +
+                           (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
+                           (pagetable_get_pfn(curr->arch.guest_table_user) ==
+                            mfn))) )
+                        sync_guest = true;
                     break;
                 case PGT_writable_page:
                     perfc_incr(writable_mmu_updates);
++++++ 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch 
++++++
# Commit a2b08fbed388f18235fda5ba1655c1483ef3e215
# Date 2018-02-14 13:22:15 +0000
# Author Andrew Cooper <[email protected]>
# Committer Andrew Cooper <[email protected]>
x86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST

DO_OVERWRITE_RSB clobbers %rax, meaning in practice that the bti_ist_info
field gets zeroed.  Older versions of this code had the DO_OVERWRITE_RSB
register selectable, so reintroduce this ability and use it to cause the
INTR_IST path to use %rdx instead.

The use of %dl for the %cs.rpl check means that when an IST interrupt hits
Xen, we try to load 1 into the high 32 bits of MSR_SPEC_CTRL, suffering a #GP
fault instead.

Also, drop an unused label which was a copy/paste mistake.

Reported-by: Boris Ostrovsky <[email protected]>
Reported-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>

--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -79,10 +79,10 @@
  *  - SPEC_CTRL_EXIT_TO_GUEST
  */
 
-.macro DO_OVERWRITE_RSB
+.macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
- * Clobbers %rax, %rcx
+ * Clobbers \tmp (%rax by default), %rcx
  *
  * Requires 256 bytes of stack space, but %rsp has no net change. Based on
  * Google's performance numbers, the loop is unrolled to 16 iterations and two
@@ -97,7 +97,7 @@
  * optimised with mov-elimination in modern cores.
  */
     mov $16, %ecx                   /* 16 iterations, two calls per loop */
-    mov %rsp, %rax                  /* Store the current %rsp */
+    mov %rsp, %\tmp                 /* Store the current %rsp */
 
 .L\@_fill_rsb_loop:
 
@@ -114,7 +114,7 @@
 
     sub $1, %ecx
     jnz .L\@_fill_rsb_loop
-    mov %rax, %rsp                  /* Restore old %rsp */
+    mov %\tmp, %rsp                 /* Restore old %rsp */
 .endm
 
 .macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
@@ -274,7 +274,7 @@
     testb $BTI_IST_RSB, %al
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB
+    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
 
 .L\@_skip_rsb:
 
@@ -286,13 +286,13 @@
     setz %dl
     and %dl, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%r14)
 
-.L\@_entry_from_xen:
     /*
      * Load Xen's intended value.  SPEC_CTRL_IBRS vs 0 is encoded in the
      * bottom bit of bti_ist_info, via a deliberate alias with BTI_IST_IBRS.
      */
     mov $MSR_SPEC_CTRL, %ecx
     and $BTI_IST_IBRS, %eax
+    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
++++++ 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch ++++++
# Commit b7dce29d9faf3597d009c853ed1fcbed9f7a7f68
# Date 2018-02-15 11:08:27 +0000
# Author Andrew Cooper <[email protected]>
# Committer Andrew Cooper <[email protected]>
x86/emul: Fix the decoding of segment overrides in 64bit mode

Explicit segment overides other than %fs and %gs are documented as ignored by
both Intel and AMD.

In practice, this means that:

 * Explicit uses of %ss don't actually yield #SS[0] for non-canonical
   memory references.
 * Explicit uses of %{e,c,d}s don't override %rbp/%rsp-based memory references
   to yield #GP[0] for non-canonical memory references.

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2463,6 +2463,10 @@ x86_decode(
     }
  done_prefixes:
 
+    /* %{e,c,s,d}s overrides are ignored in 64bit mode. */
+    if ( mode_64bit() && override_seg < x86_seg_fs )
+        override_seg = x86_seg_none;
+
     if ( rex_prefix & REX_W )
         op_bytes = 8;
 
++++++ 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch ++++++
# Commit eb1d3a3f04b85d596862a4c9dcf796e67ab4dc09
# Date 2018-02-15 11:08:27 +0000
# Author Andrew Cooper <[email protected]>
# Committer Andrew Cooper <[email protected]>
x86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs

Intel's Silvermont/Knights Landing architecture treats them as full ALU
operations, rather than zeroing idoms.

No functional change, and no change in code volume (only changing the bit
selection in the REX prefix).

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Jan Beulich <[email protected]>

--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -271,10 +271,10 @@ static always_inline void stac(void)
         movq  %r10,UREGS_r10(%rsp)
         movq  %r11,UREGS_r11(%rsp)
 .endif
-        xor   %r8, %r8
-        xor   %r9, %r9
-        xor   %r10, %r10
-        xor   %r11, %r11
+        xor   %r8d, %r8d
+        xor   %r9d, %r9d
+        xor   %r10d, %r10d
+        xor   %r11d, %r11d
         movq  %rbx,UREGS_rbx(%rsp)
         xor   %ebx, %ebx
         movq  %rbp,UREGS_rbp(%rsp)
@@ -291,10 +291,10 @@ static always_inline void stac(void)
         movq  %r14,UREGS_r14(%rsp)
         movq  %r15,UREGS_r15(%rsp)
 .endif
-        xor   %r12, %r12
-        xor   %r13, %r13
-        xor   %r14, %r14
-        xor   %r15, %r15
+        xor   %r12d, %r12d
+        xor   %r13d, %r13d
+        xor   %r14d, %r14d
+        xor   %r15d, %r15d
 .endm
 
 #define LOAD_ONE_REG(reg, compat) \
@@ -319,10 +319,10 @@ static always_inline void stac(void)
         movq  UREGS_r13(%rsp), %r13
         movq  UREGS_r12(%rsp), %r12
 .else
-        xor %r15, %r15
-        xor %r14, %r14
-        xor %r13, %r13
-        xor %r12, %r12
+        xor %r15d, %r15d
+        xor %r14d, %r14d
+        xor %r13d, %r13d
+        xor %r12d, %r12d
 .endif
         LOAD_ONE_REG(bp, \compat)
         LOAD_ONE_REG(bx, \compat)
@@ -332,10 +332,10 @@ static always_inline void stac(void)
         movq  UREGS_r9(%rsp),%r9
         movq  UREGS_r8(%rsp),%r8
 .else
-        xor %r11, %r11
-        xor %r10, %r10
-        xor %r9, %r9
-        xor %r8, %r8
+        xor %r11d, %r11d
+        xor %r10d, %r10d
+        xor %r9d, %r9d
+        xor %r8d, %r8d
 .endif
         LOAD_ONE_REG(ax, \compat)
         LOAD_ONE_REG(cx, \compat)
++++++ 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch ++++++
# Commit a44f1697968e04fcc6145e3bd51c748b57047240
# Date 2018-02-20 10:16:56 +0100
# Author Igor Druzhinin <[email protected]>
# Committer Jan Beulich <[email protected]>
x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap

We're noticing a reproducible system boot hang on certain
Skylake platforms where the BIOS is configured in legacy
boot mode with x2APIC disabled. The system stalls immediately
after writing the first SMP initialization sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling. The latter should also
help in case of post boot CPU onlining. Although we're running
watchdog at much lower frequency at this point, it's neveretheless
possible we may trigger the issue anyway.

Signed-off-by: Igor Druzhinin <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

Index: xen-4.10.0-testing/xen/arch/x86/apic.c
===================================================================
--- xen-4.10.0-testing.orig/xen/arch/x86/apic.c
+++ xen-4.10.0-testing/xen/arch/x86/apic.c
@@ -682,7 +682,7 @@ void setup_local_APIC(void)
         printk("Leaving ESR disabled.\n");
     }
 
-    if (nmi_watchdog == NMI_LOCAL_APIC)
+    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
         setup_apic_nmi_watchdog();
     apic_pm_activate();
 }
Index: xen-4.10.0-testing/xen/arch/x86/smpboot.c
===================================================================
--- xen-4.10.0-testing.orig/xen/arch/x86/smpboot.c
+++ xen-4.10.0-testing/xen/arch/x86/smpboot.c
@@ -1241,7 +1241,10 @@ int __cpu_up(unsigned int cpu)
 void __init smp_cpus_done(void)
 {
     if ( nmi_watchdog == NMI_LOCAL_APIC )
+    {
+        setup_apic_nmi_watchdog();
         check_nmi_watchdog();
+    }
 
     setup_ioapic_dest();
 
Index: xen-4.10.0-testing/xen/arch/x86/traps.c
===================================================================
--- xen-4.10.0-testing.orig/xen/arch/x86/traps.c
+++ xen-4.10.0-testing/xen/arch/x86/traps.c
@@ -1669,7 +1669,7 @@ static nmi_callback_t *nmi_callback = du
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
-    unsigned char reason;
+    unsigned char reason = 0;
     bool handle_unknown = false;
 
     ++nmi_count(cpu);
@@ -1677,6 +1677,16 @@ void do_nmi(const struct cpu_user_regs *
     if ( nmi_callback(regs, cpu) )
         return;
 
+    /*
+     * Accessing port 0x61 may trap to SMM which has been actually
+     * observed on some production SKX servers. This SMI sometimes
+     * takes enough time for the next NMI tick to happen. By reading
+     * this port before we re-arm the NMI watchdog, we reduce the chance
+     * of having an NMI watchdog expire while in the SMI handler.
+     */
+    if ( cpu == 0 )
+        reason = inb(0x61);
+
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
         handle_unknown = true;
@@ -1684,7 +1694,6 @@ void do_nmi(const struct cpu_user_regs *
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
     {
-        reason = inb(0x61);
         if ( reason & 0x80 )
             pci_serr_error(regs);
         if ( reason & 0x40 )
++++++ 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch ++++++
# Commit cc0e45db277922b5723a7b1d9657d6f744230cf1
# Date 2018-02-27 10:47:23 +0000
# Author Andrew Cooper <[email protected]>
# Committer Andrew Cooper <[email protected]>
x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context

If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
advertised/hidden.

At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
Therefore, default PV guests can read the value from a previously scheduled
HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.

Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.

To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
update of the MSR, and use this function consistently across the codebase.

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Acked-by: Jan Beulich <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1531,9 +1531,9 @@ void paravirt_ctxt_switch_to(struct vcpu
     if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
         activate_debugregs(v);
 
-    if ( (v->domain->arch.tsc_mode ==  TSC_MODE_PVRDTSCP) &&
-         boot_cpu_has(X86_FEATURE_RDTSCP) )
-        write_rdtscp_aux(v->domain->arch.incarnation);
+    if ( cpu_has_rdtscp )
+        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
+                      ? v->domain->arch.incarnation : 0);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3582,7 +3582,7 @@ int hvm_msr_write_intercept(unsigned int
         v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content;
         if ( cpu_has_rdtscp
              && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) )
-            wrmsrl(MSR_TSC_AUX, (uint32_t)msr_content);
+            wrmsr_tsc_aux(msr_content);
         break;
 
     case MSR_IA32_APICBASE:
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1077,7 +1077,7 @@ static void svm_ctxt_switch_to(struct vc
     svm_tsc_ratio_load(v);
 
     if ( cpu_has_rdtscp )
-        wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
 static void noreturn svm_do_resume(struct vcpu *v)
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -622,7 +622,7 @@ static void vmx_restore_guest_msrs(struc
     }
 
     if ( cpu_has_rdtscp )
-        wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,6 +24,8 @@
 #include <xen/sched.h>
 #include <asm/msr.h>
 
+DEFINE_PER_CPU(uint32_t, tsc_aux);
+
 struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
                          __read_mostly  pv_max_msr_domain_policy;
 
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -115,8 +115,6 @@ static inline uint64_t rdtsc_ordered(voi
     __write_tsc(val);                                           \
 })
 
-#define write_rdtscp_aux(val) wrmsr(MSR_TSC_AUX, (val), 0)
-
 #define rdpmc(counter,low,high) \
      __asm__ __volatile__("rdpmc" \
                          : "=a" (low), "=d" (high) \
@@ -202,6 +200,20 @@ void write_efer(u64 val);
 
 DECLARE_PER_CPU(u32, ler_msr);
 
+DECLARE_PER_CPU(uint32_t, tsc_aux);
+
+/* Lazy update of MSR_TSC_AUX */
+static inline void wrmsr_tsc_aux(uint32_t val)
+{
+    uint32_t *this_tsc_aux = &this_cpu(tsc_aux);
+
+    if ( *this_tsc_aux != val )
+    {
+        wrmsr(MSR_TSC_AUX, val, 0);
+        *this_tsc_aux = val;
+    }
+}
+
 /* MSR policy object for shared per-domain MSRs */
 struct msr_domain_policy
 {
++++++ 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch ++++++
# Commit d798a0952903db9d8ee0a580e03f214d2b49b7d7
# Date 2018-02-27 14:03:27 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
memory: don't implicitly unpin for decrease-reservation

It very likely was a mistake (copy-and-paste from domain cleanup code)
to implicitly unpin here: The caller should really unpin itself before
(or after, if they so wish) requesting the page to be removed.

This is XSA-252.

Reported-by: Jann Horn <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -357,11 +357,6 @@ int guest_remove_page(struct domain *d,
 
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
-#ifdef _PGT_pinned
-    if ( !rc && test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-        put_page_and_type(page);
-#endif
-
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap
++++++ 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch ++++++
# Commit 9d2f8f9c65d4da35437f50ed9e812a2c5ab313e2
# Date 2018-02-27 14:04:44 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
gnttab/ARM: don't corrupt shared GFN array

... by writing status GFNs to it. Introduce a second array instead.
Also implement gnttab_status_gmfn() properly now that the information is
suitably being tracked.

While touching it anyway, remove a misguided (but luckily benign) upper
bound check from gnttab_shared_gmfn(): We should never access beyond the
bounds of that array.

This is part of XSA-255.

Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -3777,6 +3777,7 @@ int gnttab_map_frame(struct domain *d, u
 {
     int rc = 0;
     struct grant_table *gt = d->grant_table;
+    bool status = false;
 
     grant_write_lock(gt);
 
@@ -3787,6 +3788,7 @@ int gnttab_map_frame(struct domain *d, u
          (idx & XENMAPIDX_grant_table_status) )
     {
         idx &= ~XENMAPIDX_grant_table_status;
+        status = true;
         if ( idx < nr_status_frames(gt) )
             *mfn = _mfn(virt_to_mfn(gt->status[idx]));
         else
@@ -3804,7 +3806,7 @@ int gnttab_map_frame(struct domain *d, u
     }
 
     if ( !rc )
-        gnttab_set_frame_gfn(gt, idx, gfn);
+        gnttab_set_frame_gfn(gt, status, idx, gfn);
 
     grant_write_unlock(gt);
 
--- a/xen/include/asm-arm/grant_table.h
+++ b/xen/include/asm-arm/grant_table.h
@@ -9,7 +9,8 @@
 #define INITIAL_NR_GRANT_FRAMES 1U
 
 struct grant_table_arch {
-    gfn_t *gfn;
+    gfn_t *shared_gfn;
+    gfn_t *status_gfn;
 };
 
 void gnttab_clear_flag(unsigned long nr, uint16_t *addr);
@@ -21,7 +22,6 @@ int replace_grant_host_mapping(unsigned
         unsigned long new_gpaddr, unsigned int flags);
 void gnttab_mark_dirty(struct domain *d, unsigned long l);
 #define gnttab_create_status_page(d, t, i) do {} while (0)
-#define gnttab_status_gmfn(d, t, i) (0)
 #define gnttab_release_host_mappings(domain) 1
 static inline int replace_grant_supported(void)
 {
@@ -42,19 +42,35 @@ static inline unsigned int gnttab_dom0_m
 
 #define gnttab_init_arch(gt)                                             \
 ({                                                                       \
-    (gt)->arch.gfn = xzalloc_array(gfn_t, (gt)->max_grant_frames);       \
-    ( (gt)->arch.gfn ? 0 : -ENOMEM );                                    \
+    unsigned int ngf_ = (gt)->max_grant_frames;                          \
+    unsigned int nsf_ = grant_to_status_frames(ngf_);                    \
+                                                                         \
+    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);                  \
+    (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);                  \
+    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )                \
+    {                                                                    \
+        while ( ngf_-- )                                                 \
+            (gt)->arch.shared_gfn[ngf_] = INVALID_GFN;                   \
+        while ( nsf_-- )                                                 \
+            (gt)->arch.status_gfn[nsf_] = INVALID_GFN;                   \
+    }                                                                    \
+    else                                                                 \
+        gnttab_destroy_arch(gt);                                         \
+    (gt)->arch.shared_gfn ? 0 : -ENOMEM;                                 \
 })
 
 #define gnttab_destroy_arch(gt)                                          \
     do {                                                                 \
-        xfree((gt)->arch.gfn);                                           \
-        (gt)->arch.gfn = NULL;                                           \
+        xfree((gt)->arch.shared_gfn);                                    \
+        (gt)->arch.shared_gfn = NULL;                                    \
+        xfree((gt)->arch.status_gfn);                                    \
+        (gt)->arch.status_gfn = NULL;                                    \
     } while ( 0 )
 
-#define gnttab_set_frame_gfn(gt, idx, gfn)                               \
+#define gnttab_set_frame_gfn(gt, st, idx, gfn)                           \
     do {                                                                 \
-        (gt)->arch.gfn[idx] = gfn;                                       \
+        ((st) ? (gt)->arch.status_gfn : (gt)->arch.shared_gfn)[idx] =    \
+            (gfn);                                                       \
     } while ( 0 )
 
 #define gnttab_create_shared_page(d, t, i)                               \
@@ -65,8 +81,10 @@ static inline unsigned int gnttab_dom0_m
     } while ( 0 )
 
 #define gnttab_shared_gmfn(d, t, i)                                      \
-    ( ((i >= nr_grant_frames(t)) &&                                      \
-       (i < (t)->max_grant_frames))? 0 : gfn_x((t)->arch.gfn[i]))
+    gfn_x(((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
+
+#define gnttab_status_gmfn(d, t, i)                                      \
+    gfn_x(((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])
 
 #define gnttab_need_iommu_mapping(d)                    \
     (is_domain_direct_mapped(d) && need_iommu(d))
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -46,7 +46,7 @@ static inline unsigned int gnttab_dom0_m
 
 #define gnttab_init_arch(gt) 0
 #define gnttab_destroy_arch(gt) do {} while ( 0 )
-#define gnttab_set_frame_gfn(gt, idx, gfn) do {} while ( 0 )
+#define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 )
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
++++++ 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch ++++++
# Commit 38bfcc165dda5f4284d7c218b91df9e144ddd88d
# Date 2018-02-27 14:07:12 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
gnttab: don't blindly free status pages upon version change

There may still be active mappings, which would trigger the respective
BUG_ON(). Split the loop into one dealing with the page attributes and
the second (when the first fully passed) freeing the pages. Return an
error if any pages still have pending references.

This is part of XSA-255.

Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1644,23 +1644,74 @@ status_alloc_failed:
     return -ENOMEM;
 }
 
-static void
+static int
 gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
 {
-    int i;
+    unsigned int i;
 
     for ( i = 0; i < nr_status_frames(gt); i++ )
     {
         struct page_info *pg = virt_to_page(gt->status[i]);
+        gfn_t gfn = gnttab_get_frame_gfn(gt, true, i);
+
+        /*
+         * For translated domains, recovering from failure after partial
+         * changes were made is more complicated than it seems worth
+         * implementing at this time. Hence respective error paths below
+         * crash the domain in such a case.
+         */
+        if ( paging_mode_translate(d) )
+        {
+            int rc = gfn_eq(gfn, INVALID_GFN)
+                     ? 0
+                     : guest_physmap_remove_page(d, gfn,
+                                                 _mfn(page_to_mfn(pg)), 0);
+
+            if ( rc )
+            {
+                gprintk(XENLOG_ERR,
+                        "Could not remove status frame %u (GFN %#lx) from 
P2M\n",
+                        i, gfn_x(gfn));
+                domain_crash(d);
+                return rc;
+            }
+            gnttab_set_frame_gfn(gt, true, i, INVALID_GFN);
+        }
 
         BUG_ON(page_get_owner(pg) != d);
         if ( test_and_clear_bit(_PGC_allocated, &pg->count_info) )
             put_page(pg);
-        BUG_ON(pg->count_info & ~PGC_xen_heap);
+
+        if ( pg->count_info & ~PGC_xen_heap )
+        {
+            if ( paging_mode_translate(d) )
+            {
+                gprintk(XENLOG_ERR,
+                        "Wrong page state %#lx of status frame %u (GFN 
%#lx)\n",
+                        pg->count_info, i, gfn_x(gfn));
+                domain_crash(d);
+            }
+            else
+            {
+                if ( get_page(pg, d) )
+                    set_bit(_PGC_allocated, &pg->count_info);
+                while ( i-- )
+                    gnttab_create_status_page(d, gt, i);
+            }
+            return -EBUSY;
+        }
+
+        page_set_owner(pg, NULL);
+    }
+
+    for ( i = 0; i < nr_status_frames(gt); i++ )
+    {
         free_xenheap_page(gt->status[i]);
         gt->status[i] = NULL;
     }
     gt->nr_status_frames = 0;
+
+    return 0;
 }
 
 /*
@@ -2970,8 +3021,9 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
         break;
     }
 
-    if ( op.version < 2 && gt->gt_version == 2 )
-        gnttab_unpopulate_status_frames(currd, gt);
+    if ( op.version < 2 && gt->gt_version == 2 &&
+         (res = gnttab_unpopulate_status_frames(currd, gt)) != 0 )
+        goto out_unlock;
 
     /* Make sure there's no crud left over from the old version. */
     for ( i = 0; i < nr_grant_frames(gt); i++ )
@@ -3805,6 +3857,11 @@ int gnttab_map_frame(struct domain *d, u
             rc = -EINVAL;
     }
 
+    if ( !rc && paging_mode_translate(d) &&
+         !gfn_eq(gnttab_get_frame_gfn(gt, status, idx), INVALID_GFN) )
+        rc = guest_physmap_remove_page(d, gnttab_get_frame_gfn(gt, status, 
idx),
+                                       *mfn, 0);
+
     if ( !rc )
         gnttab_set_frame_gfn(gt, status, idx, gfn);
 
--- a/xen/include/asm-arm/grant_table.h
+++ b/xen/include/asm-arm/grant_table.h
@@ -73,6 +73,11 @@ static inline unsigned int gnttab_dom0_m
             (gfn);                                                       \
     } while ( 0 )
 
+#define gnttab_get_frame_gfn(gt, st, idx) ({                             \
+   _gfn((st) ? gnttab_status_gmfn(NULL, gt, idx)                         \
+             : gnttab_shared_gmfn(NULL, gt, idx));                       \
+})
+
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
         share_xen_page_with_guest(                                       \
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -47,6 +47,12 @@ static inline unsigned int gnttab_dom0_m
 #define gnttab_init_arch(gt) 0
 #define gnttab_destroy_arch(gt) do {} while ( 0 )
 #define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 )
+#define gnttab_get_frame_gfn(gt, st, idx) ({                             \
+    unsigned long mfn_ = (st) ? gnttab_status_mfn(gt, idx)               \
+                              : gnttab_shared_mfn(gt, idx);              \
+    unsigned long gpfn_ = get_gpfn_from_mfn(mfn_);                       \
+    VALID_M2P(gpfn_) ? _gfn(gpfn_) : INVALID_GFN;                        \
+})
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
@@ -63,11 +69,11 @@ static inline unsigned int gnttab_dom0_m
     } while ( 0 )
 
 
-#define gnttab_shared_mfn(d, t, i)                      \
+#define gnttab_shared_mfn(t, i)                         \
     ((virt_to_maddr((t)->shared_raw[i]) >> PAGE_SHIFT))
 
 #define gnttab_shared_gmfn(d, t, i)                     \
-    (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
+    (mfn_to_gmfn(d, gnttab_shared_mfn(t, i)))
 
 
 #define gnttab_status_mfn(t, i)                         \
++++++ 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch ++++++
# Commit 0aa6158b674c5d083b75ac8fcd1e7ae92d0c39ae
# Date 2018-02-27 14:08:36 +0100
# Author Andrew Cooper <[email protected]>
# Committer Jan Beulich <[email protected]>
x86/hvm: Disallow the creation of HVM domains without Local APIC emulation

There are multiple problems, not necesserily limited to:

 * Guests which configure event channels via hvmop_set_evtchn_upcall_vector(),
   or which hit %cr8 emulation will cause Xen to fall over a NULL vlapic->regs
   pointer.

 * On Intel hardware, disabling the TPR_SHADOW execution control without
   reenabling CR8_{LOAD,STORE} interception means that the guests %cr8
   accesses interact with the real TPR.  Amongst other things, setting the
   real TPR to 0xf blocks even IPIs from interrupting this CPU.

 * On hardware which sets up the use of Interrupt Posting, including
   IOMMU-Posting, guests run without the appropriate non-root configuration,
   which at a minimum will result in dropped interrupts.

Whether no-LAPIC mode is of any use at all remains to be seen.

This is XSA-256.

Reported-by: Ian Jackson <[email protected]>
Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -413,7 +413,7 @@ static bool emulation_flags_ok(const str
         if ( is_hardware_domain(d) &&
              emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
             return false;
-        if ( !is_hardware_domain(d) && emflags &&
+        if ( !is_hardware_domain(d) &&
              emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
             return false;
     }
++++++ 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch ++++++
References: bsc#1072834

# Commit 1f1d183d49008794b087cf043fc77f724a45af98
# Date 2018-02-27 15:12:23 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
x86/HVM: don't give the wrong impression of WRMSR succeeding

... for non-existent MSRs: wrmsr_hypervisor_regs()'s comment clearly
says that the function returns 0 for unrecognized MSRs, so
{svm,vmx}_msr_write_intercept() should not convert this into success. We
don't want to unconditionally fail the access though, as we can't be
certain the list of handled MSRs is complete enough for the guest types
we care about, so instead mirror what we do on the read paths and probe
the MSR to decide whether to raise #GP.

Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>

--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2106,6 +2106,13 @@ static int svm_msr_write_intercept(unsig
             result = X86EMUL_RETRY;
             break;
         case 0:
+            /*
+             * Match up with the RDMSR side for now; ultimately this entire
+             * case block should go away.
+             */
+            if ( rdmsr_safe(msr, msr_content) == 0 )
+                break;
+            goto gpf;
         case 1:
             break;
         default:
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3182,6 +3182,13 @@ static int vmx_msr_write_intercept(unsig
                     case -ERESTART:
                         return X86EMUL_RETRY;
                     case 0:
+                        /*
+                         * Match up with the RDMSR side for now; ultimately 
this
+                         * entire case block should go away.
+                         */
+                        if ( rdmsr_safe(msr, msr_content) == 0 )
+                            break;
+                        goto gp_fault;
                     case 1:
                         break;
                     default:

commit xen for openSUSE:Factory

Reply via email to