Hello community, here is the log from the commit of package xen for openSUSE:Factory checked in at 2018-03-20 21:50:37 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/xen (Old) and /work/SRC/openSUSE:Factory/.xen.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "xen" Tue Mar 20 21:50:37 2018 rev:244 rq:586076 version:4.10.0_14 Changes: -------- --- /work/SRC/openSUSE:Factory/xen/xen.changes 2018-03-01 12:02:21.481832679 +0100 +++ /work/SRC/openSUSE:Factory/.xen.new/xen.changes 2018-03-20 21:50:48.542316318 +0100 @@ -1,0 +2,24 @@ +Thu Mar 1 09:36:03 MST 2018 - [email protected] + +- bsc#1072834 - Xen HVM: unchecked MSR access error: RDMSR from + 0xc90 at rIP: 0xffffffff93061456 (native_read_msr+0x6/0x30) + 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch +- Upstream patches from Jan (bsc#1027519) + 5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch + 5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch + 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch + 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch + 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch + 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch + 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch + 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch (Replaces xsa252.patch) + 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch (Replaces xsa255-1.patch) + 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch (Replaces xsa255-2.patch) + 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch (Replaces xsa256.patch) +- Drop + xsa252.patch + xsa255-1.patch + xsa255-2.patch + xsa256.patch + +------------------------------------------------------------------- @@ -4,2 +28,2 @@ -- bsc#1080635 - VUL-0: xen: DoS via non-preemptable L3/L4 pagetable - freeing (XSA-252) +- bsc#1080635 - VUL-0: CVE-2018-7540: xen: DoS via non-preemptable + L3/L4 pagetable freeing (XSA-252) @@ -7,2 +31,2 @@ -- bsc#1080662 - VUL-0: xen: grant table v2 -> v1 transition may - crash Xen (XSA-255) +- bsc#1080662 - VUL-0: CVE-2018-7541: xen: grant table v2 -> v1 + transition may crash Xen (XSA-255) @@ -11,2 +35,2 @@ -- bsc#1080634 - VUL-0: xen: x86 PVH guest without LAPIC may DoS the - host (XSA-256) +- bsc#1080634 - VUL-0: CVE-2018-7542: xen: x86 PVH guest without + LAPIC may DoS the host (XSA-256) @@ -56,2 +80,3 @@ -- bsc#1074562 - VUL-0: xen: Information leak via side effects of - speculative execution (XSA-254). Includes Spectre v2 mitigation. +- bsc#1074562 - VUL-0: CVE-2017-5753,CVE-2017-5715,CVE-2017-5754 + xen: Information leak via side effects of speculative execution + (XSA-254). Includes Spectre v2 mitigation. Old: ---- xsa252.patch xsa255-1.patch xsa255-2.patch xsa256.patch New: ---- 5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch 5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ xen.spec ++++++ --- /var/tmp/diff_new_pack.FoVs26/_old 2018-03-20 21:50:50.858232921 +0100 +++ /var/tmp/diff_new_pack.FoVs26/_new 2018-03-20 21:50:50.862232777 +0100 @@ -126,7 +126,7 @@ BuildRequires: pesign-obs-integration %endif -Version: 4.10.0_13 +Version: 4.10.0_14 Release: 0 Summary: Xen Virtualization: Hypervisor (aka VMM aka Microkernel) License: GPL-2.0 @@ -206,10 +206,18 @@ Patch43: 5a6b36cd-9-x86-issue-speculation-barrier.patch Patch44: 5a6b36cd-A-x86-offer-Indirect-Branch-Controls-to-guests.patch Patch45: 5a6b36cd-B-x86-clear-SPEC_CTRL-while-idle.patch -Patch252: xsa252.patch -Patch25501: xsa255-1.patch -Patch25502: xsa255-2.patch -Patch256: xsa256.patch +Patch46: 5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch +Patch47: 5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch +Patch48: 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch +Patch49: 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch +Patch50: 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch +Patch51: 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch +Patch52: 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch +Patch53: 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch +Patch54: 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch +Patch55: 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch +Patch56: 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch +Patch57: 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch # Our platform specific patches Patch400: xen-destdir.patch Patch401: vif-bridge-no-iptables.patch @@ -445,10 +453,18 @@ %patch43 -p1 %patch44 -p1 %patch45 -p1 -%patch252 -p1 -%patch25501 -p1 -%patch25502 -p1 -%patch256 -p1 +%patch46 -p1 +%patch47 -p1 +%patch48 -p1 +%patch49 -p1 +%patch50 -p1 +%patch51 -p1 +%patch52 -p1 +%patch53 -p1 +%patch54 -p1 +%patch55 -p1 +%patch56 -p1 +%patch57 -p1 # Our platform specific patches %patch400 -p1 %patch401 -p1 ++++++ 5a5e3a4e-7-x86-cmdline-opt-to-disable-IBRS-IBPB-STIBP.patch ++++++ --- /var/tmp/diff_new_pack.FoVs26/_old 2018-03-20 21:50:50.990228168 +0100 +++ /var/tmp/diff_new_pack.FoVs26/_new 2018-03-20 21:50:50.990228168 +0100 @@ -13,8 +13,23 @@ Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Jan Beulich <[email protected]> ---- a/docs/misc/xen-command-line.markdown -+++ b/docs/misc/xen-command-line.markdown +# Commit ac37ec1ddef234eeba6f438c29ff687c64962ebd +# Date 2018-01-31 10:47:12 +0000 +# Author Andrew Cooper <[email protected]> +# Committer Andrew Cooper <[email protected]> +xen/cmdline: Fix parse_boolean() for unadorned values + +A command line such as "cpuid=no-ibrsb,no-stibp" tickles a bug in +parse_boolean() because the separating comma fails the NUL case. + +Instead, check for slen == nlen which accounts for the boundary (if any) +passed via the 'e' parameter. + +Signed-off-by: Andrew Cooper <[email protected]> +Reviewed-by: Jan Beulich <[email protected]> + +--- trunk.orig/docs/misc/xen-command-line.markdown 2018-02-01 11:40:54.706665840 +0100 ++++ trunk/docs/misc/xen-command-line.markdown 2018-02-01 00:00:00.000000000 +0100 @@ -471,6 +471,18 @@ choice of `dom0-kernel` is deprecated an respectively. * `verbose` option can be included as a string or also as `verbose=<integer>` @@ -34,8 +49,8 @@ ### cpuid\_mask\_cpu (AMD only) > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g | fam_10_rev_b | fam_10_rev_c | fam_11_rev_b` ---- a/xen/arch/x86/cpuid.c -+++ b/xen/arch/x86/cpuid.c +--- trunk.orig/xen/arch/x86/cpuid.c 2018-02-01 11:40:54.706665840 +0100 ++++ trunk/xen/arch/x86/cpuid.c 2018-02-01 00:00:00.000000000 +0100 @@ -18,6 +18,41 @@ static const uint32_t hvm_shadow_feature static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES; static const uint32_t deep_features[] = INIT_DEEP_FEATURES; @@ -78,9 +93,9 @@ #define EMPTY_LEAF ((struct cpuid_leaf){}) static void zero_leaves(struct cpuid_leaf *l, unsigned int first, unsigned int last) ---- a/xen/common/kernel.c -+++ b/xen/common/kernel.c -@@ -244,6 +244,29 @@ int parse_bool(const char *s, const char +--- trunk.orig/xen/common/kernel.c 2018-02-01 11:40:54.706665840 +0100 ++++ trunk/xen/common/kernel.c 2018-02-01 11:40:25.000000000 +0100 +@@ -244,6 +244,33 @@ int parse_bool(const char *s, const char return -1; } @@ -99,19 +114,23 @@ + if ( slen < nlen || strncmp(s, name, nlen) ) + return -1; + -+ switch ( s[nlen] ) -+ { -+ case '\0': return val; -+ case '=': return parse_bool(&s[nlen + 1], e); -+ default: return -1; -+ } ++ /* Exact, unadorned name? Result depends on the 'no-' prefix. */ ++ if ( slen == nlen ) ++ return val; ++ ++ /* =$SOMETHING? Defer to the regular boolean parsing. */ ++ if ( s[nlen] == '=' ) ++ return parse_bool(&s[nlen + 1], e); ++ ++ /* Unrecognised. Give up. */ ++ return -1; +} + unsigned int tainted; /** ---- a/xen/include/xen/lib.h -+++ b/xen/include/xen/lib.h +--- trunk.orig/xen/include/xen/lib.h 2018-02-01 11:40:54.706665840 +0100 ++++ trunk/xen/include/xen/lib.h 2018-02-01 00:00:00.000000000 +0100 @@ -74,6 +74,13 @@ void cmdline_parse(const char *cmdline); int runtime_parse(const char *line); int parse_bool(const char *s, const char *e); ++++++ 5a6b36cd-6-x86-clobber-RSB-RAS-on-entry.patch ++++++ --- /var/tmp/diff_new_pack.FoVs26/_old 2018-03-20 21:50:51.038226440 +0100 +++ /var/tmp/diff_new_pack.FoVs26/_new 2018-03-20 21:50:51.038226440 +0100 @@ -29,7 +29,7 @@ XEN_CPUFEATURE(XEN_IBRS_SET, (FSCAPINTS+0)*32+16) /* IBRSB && IRBS set in Xen */ XEN_CPUFEATURE(XEN_IBRS_CLEAR, (FSCAPINTS+0)*32+17) /* IBRSB && IBRS clear in Xen */ +XEN_CPUFEATURE(RSB_NATIVE, (FSCAPINTS+0)*32+18) /* RSB overwrite needed for native */ -+XEN_CPUFEATURE(RSB_VMEXIT, (FSCAPINTS+0)*32+20) /* RSB overwrite needed for vmexit */ ++XEN_CPUFEATURE(RSB_VMEXIT, (FSCAPINTS+0)*32+19) /* RSB overwrite needed for vmexit */ --- a/xen/include/asm-x86/nops.h +++ b/xen/include/asm-x86/nops.h @@ -66,6 +66,7 @@ ++++++ 5a6b36cd-8-x86-boot-calculate-best-BTI-mitigation.patch ++++++ --- /var/tmp/diff_new_pack.FoVs26/_old 2018-03-20 21:50:51.054225864 +0100 +++ /var/tmp/diff_new_pack.FoVs26/_new 2018-03-20 21:50:51.054225864 +0100 @@ -16,6 +16,37 @@ Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Jan Beulich <[email protected]> +# Commit 30cbd0c83ef3d0edac2d5bcc41a9a2b7a843ae58 +# Date 2018-02-06 18:32:58 +0000 +# Author Andrew Cooper <[email protected]> +# Committer Andrew Cooper <[email protected]> +x86/spec_ctrl: Fix determination of when to use IBRS + +The original version of this logic was: + + /* + * On Intel hardware, we'd like to use retpoline in preference to + * IBRS, but only if it is safe on this hardware. + */ + else if ( boot_cpu_has(X86_FEATURE_IBRSB) ) + { + if ( retpoline_safe() ) + thunk = THUNK_RETPOLINE; + else + ibrs = true; + } + +but it was changed by a request during review. Sadly, the result is buggy as +it breaks the later fallback logic by allowing IBRS to appear as available +when in fact it isn't. + +This in practice means that on repoline-unsafe hardware without IBRS, we +select THUNK_JUMP despite intending to select THUNK_RETPOLINE. + +Reported-by: Zhenzhong Duan <[email protected]> +Signed-off-by: Andrew Cooper <[email protected]> +Reviewed-by: Jan Beulich <[email protected]> + --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -246,7 +246,7 @@ enough. Setting this to a high value may @@ -180,7 +211,7 @@ + */ + else if ( retpoline_safe() ) + thunk = THUNK_RETPOLINE; -+ else ++ else if ( boot_cpu_has(X86_FEATURE_IBRSB) ) + ibrs = true; } + /* Without compiler thunk support, use IBRS if available. */ ++++++ 5a79d7ed-libxc-packed-initrd-dont-fail-domain-creation.patch ++++++ References: bsc#1055047 # Commit d0115f96ea633fd6d668f2c067785912c0ad4c00 # Date 2018-02-06 17:29:33 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> libxc: don't fail domain creation when unpacking initrd fails At least Linux kernels have been able to work with gzip-ed initrd for quite some time; initrd compressed with other methods aren't even being attempted to unpack. Furthermore the unzip-ing routine used here isn't capable of dealing with various forms of concatenated files, each of which was gzip-ed separately (it is this particular case which has been the source of observed VM creation failures). Hence, if unpacking fails, simply hand the compressed blob to the guest as is. Signed-off-by: Jan Beulich <[email protected]> Acked-by: Wei Liu <[email protected]> --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -291,7 +291,6 @@ int xc_dom_mem_init(struct xc_dom_image int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz); int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz); -int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz); int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz); int xc_dom_devicetree_max_size(struct xc_dom_image *dom, size_t sz); --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -314,22 +314,6 @@ int xc_dom_kernel_check_size(struct xc_d return 0; } -int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz) -{ - /* No limit */ - if ( !dom->max_ramdisk_size ) - return 0; - - if ( sz > dom->max_ramdisk_size ) - { - xc_dom_panic(dom->xch, XC_INVALID_KERNEL, - "ramdisk image too large"); - return 1; - } - - return 0; -} - /* ------------------------------------------------------------------------ */ /* read files, copy memory blocks, with transparent gunzip */ @@ -996,16 +980,27 @@ static int xc_dom_build_ramdisk(struct x void *ramdiskmap; if ( !dom->ramdisk_seg.vstart ) - { unziplen = xc_dom_check_gzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size); - if ( xc_dom_ramdisk_check_size(dom, unziplen) != 0 ) - unziplen = 0; - } else unziplen = 0; - ramdisklen = unziplen ? unziplen : dom->ramdisk_size; + ramdisklen = max(unziplen, dom->ramdisk_size); + if ( dom->max_ramdisk_size ) + { + if ( unziplen && ramdisklen > dom->max_ramdisk_size ) + { + ramdisklen = min(unziplen, dom->ramdisk_size); + if ( unziplen > ramdisklen ) + unziplen = 0; + } + if ( ramdisklen > dom->max_ramdisk_size ) + { + xc_dom_panic(dom->xch, XC_INVALID_KERNEL, + "ramdisk image too large"); + goto err; + } + } if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk", dom->ramdisk_seg.vstart, ramdisklen) != 0 ) @@ -1020,11 +1015,18 @@ static int xc_dom_build_ramdisk(struct x if ( unziplen ) { if ( xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size, - ramdiskmap, ramdisklen) == -1 ) + ramdiskmap, unziplen) != -1 ) + return 0; + if ( dom->ramdisk_size > ramdisklen ) goto err; } - else - memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); + + /* Fall back to handing over the raw blob. */ + memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); + /* If an unzip attempt was made, the buffer may no longer be all zero. */ + if ( unziplen > dom->ramdisk_size ) + memset(ramdiskmap + dom->ramdisk_size, 0, + unziplen - dom->ramdisk_size); return 0; ++++++ 5a7b1bdd-x86-reduce-Meltdown-band-aid-IPI-overhead.patch ++++++ # Commit a22320e32dca0918ed23799583f470afe4c24330 # Date 2018-02-07 16:31:41 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> x86: reduce Meltdown band-aid IPI overhead In case we can detect single-threaded guest processes (by checking whether we can account for all root page table uses locally on the vCPU that's running), there's no point in issuing a sync IPI upon an L4 entry update, as no other vCPU of the guest will have that page table loaded. Signed-off-by: Jan Beulich <[email protected]> Acked-by: George Dunlap <[email protected]> Acked-by: Andrew Cooper <[email protected]> --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -3664,8 +3664,18 @@ long do_mmu_update( case PGT_l4_page_table: rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - if ( !rc ) - sync_guest = this_cpu(root_pgt); + /* + * No need to sync if all uses of the page can be accounted + * to the page lock we hold, its pinned status, and uses on + * this (v)CPU. + */ + if ( !rc && this_cpu(root_pgt) && + ((page->u.inuse.type_info & PGT_count_mask) > + (1 + !!(page->u.inuse.type_info & PGT_pinned) + + (pagetable_get_pfn(curr->arch.guest_table) == mfn) + + (pagetable_get_pfn(curr->arch.guest_table_user) == + mfn))) ) + sync_guest = true; break; case PGT_writable_page: perfc_incr(writable_mmu_updates); ++++++ 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch ++++++ # Commit a2b08fbed388f18235fda5ba1655c1483ef3e215 # Date 2018-02-14 13:22:15 +0000 # Author Andrew Cooper <[email protected]> # Committer Andrew Cooper <[email protected]> x86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST DO_OVERWRITE_RSB clobbers %rax, meaning in practice that the bti_ist_info field gets zeroed. Older versions of this code had the DO_OVERWRITE_RSB register selectable, so reintroduce this ability and use it to cause the INTR_IST path to use %rdx instead. The use of %dl for the %cs.rpl check means that when an IST interrupt hits Xen, we try to load 1 into the high 32 bits of MSR_SPEC_CTRL, suffering a #GP fault instead. Also, drop an unused label which was a copy/paste mistake. Reported-by: Boris Ostrovsky <[email protected]> Reported-by: Zhenzhong Duan <[email protected]> Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Reviewed-by: Wei Liu <[email protected]> Reviewed-by: Roger Pau Monné <[email protected]> --- a/xen/include/asm-x86/spec_ctrl_asm.h +++ b/xen/include/asm-x86/spec_ctrl_asm.h @@ -79,10 +79,10 @@ * - SPEC_CTRL_EXIT_TO_GUEST */ -.macro DO_OVERWRITE_RSB +.macro DO_OVERWRITE_RSB tmp=rax /* * Requires nothing - * Clobbers %rax, %rcx + * Clobbers \tmp (%rax by default), %rcx * * Requires 256 bytes of stack space, but %rsp has no net change. Based on * Google's performance numbers, the loop is unrolled to 16 iterations and two @@ -97,7 +97,7 @@ * optimised with mov-elimination in modern cores. */ mov $16, %ecx /* 16 iterations, two calls per loop */ - mov %rsp, %rax /* Store the current %rsp */ + mov %rsp, %\tmp /* Store the current %rsp */ .L\@_fill_rsb_loop: @@ -114,7 +114,7 @@ sub $1, %ecx jnz .L\@_fill_rsb_loop - mov %rax, %rsp /* Restore old %rsp */ + mov %\tmp, %rsp /* Restore old %rsp */ .endm .macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req @@ -274,7 +274,7 @@ testb $BTI_IST_RSB, %al jz .L\@_skip_rsb - DO_OVERWRITE_RSB + DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */ .L\@_skip_rsb: @@ -286,13 +286,13 @@ setz %dl and %dl, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%r14) -.L\@_entry_from_xen: /* * Load Xen's intended value. SPEC_CTRL_IBRS vs 0 is encoded in the * bottom bit of bti_ist_info, via a deliberate alias with BTI_IST_IBRS. */ mov $MSR_SPEC_CTRL, %ecx and $BTI_IST_IBRS, %eax + xor %edx, %edx wrmsr /* Opencoded UNLIKELY_START() with no condition. */ ++++++ 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch ++++++ # Commit b7dce29d9faf3597d009c853ed1fcbed9f7a7f68 # Date 2018-02-15 11:08:27 +0000 # Author Andrew Cooper <[email protected]> # Committer Andrew Cooper <[email protected]> x86/emul: Fix the decoding of segment overrides in 64bit mode Explicit segment overides other than %fs and %gs are documented as ignored by both Intel and AMD. In practice, this means that: * Explicit uses of %ss don't actually yield #SS[0] for non-canonical memory references. * Explicit uses of %{e,c,d}s don't override %rbp/%rsp-based memory references to yield #GP[0] for non-canonical memory references. Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2463,6 +2463,10 @@ x86_decode( } done_prefixes: + /* %{e,c,s,d}s overrides are ignored in 64bit mode. */ + if ( mode_64bit() && override_seg < x86_seg_fs ) + override_seg = x86_seg_none; + if ( rex_prefix & REX_W ) op_bytes = 8; ++++++ 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch ++++++ # Commit eb1d3a3f04b85d596862a4c9dcf796e67ab4dc09 # Date 2018-02-15 11:08:27 +0000 # Author Andrew Cooper <[email protected]> # Committer Andrew Cooper <[email protected]> x86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs Intel's Silvermont/Knights Landing architecture treats them as full ALU operations, rather than zeroing idoms. No functional change, and no change in code volume (only changing the bit selection in the REX prefix). Signed-off-by: Andrew Cooper <[email protected]> Acked-by: Jan Beulich <[email protected]> --- a/xen/include/asm-x86/asm_defns.h +++ b/xen/include/asm-x86/asm_defns.h @@ -271,10 +271,10 @@ static always_inline void stac(void) movq %r10,UREGS_r10(%rsp) movq %r11,UREGS_r11(%rsp) .endif - xor %r8, %r8 - xor %r9, %r9 - xor %r10, %r10 - xor %r11, %r11 + xor %r8d, %r8d + xor %r9d, %r9d + xor %r10d, %r10d + xor %r11d, %r11d movq %rbx,UREGS_rbx(%rsp) xor %ebx, %ebx movq %rbp,UREGS_rbp(%rsp) @@ -291,10 +291,10 @@ static always_inline void stac(void) movq %r14,UREGS_r14(%rsp) movq %r15,UREGS_r15(%rsp) .endif - xor %r12, %r12 - xor %r13, %r13 - xor %r14, %r14 - xor %r15, %r15 + xor %r12d, %r12d + xor %r13d, %r13d + xor %r14d, %r14d + xor %r15d, %r15d .endm #define LOAD_ONE_REG(reg, compat) \ @@ -319,10 +319,10 @@ static always_inline void stac(void) movq UREGS_r13(%rsp), %r13 movq UREGS_r12(%rsp), %r12 .else - xor %r15, %r15 - xor %r14, %r14 - xor %r13, %r13 - xor %r12, %r12 + xor %r15d, %r15d + xor %r14d, %r14d + xor %r13d, %r13d + xor %r12d, %r12d .endif LOAD_ONE_REG(bp, \compat) LOAD_ONE_REG(bx, \compat) @@ -332,10 +332,10 @@ static always_inline void stac(void) movq UREGS_r9(%rsp),%r9 movq UREGS_r8(%rsp),%r8 .else - xor %r11, %r11 - xor %r10, %r10 - xor %r9, %r9 - xor %r8, %r8 + xor %r11d, %r11d + xor %r10d, %r10d + xor %r9d, %r9d + xor %r8d, %r8d .endif LOAD_ONE_REG(ax, \compat) LOAD_ONE_REG(cx, \compat) ++++++ 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch ++++++ # Commit a44f1697968e04fcc6145e3bd51c748b57047240 # Date 2018-02-20 10:16:56 +0100 # Author Igor Druzhinin <[email protected]> # Committer Jan Beulich <[email protected]> x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap We're noticing a reproducible system boot hang on certain Skylake platforms where the BIOS is configured in legacy boot mode with x2APIC disabled. The system stalls immediately after writing the first SMP initialization sequence into APIC ICR. The cause of the problem is watchdog NMI handler execution - somewhere near the end of NMI handling (after it's already rescheduled the next NMI) it tries to access IO port 0x61 to get the actual NMI reason on CPU0. Unfortunately, this port is emulated by BIOS using SMIs and this emulation for some reason takes more time than we expect during INIT-SIPI-SIPI sequence. As the result, the system is constantly moving between NMI and SMI handler and not making any progress. To avoid this, initialize the watchdog after SMP bootstrap on CPU0 and, additionally, protect the NMI handler by moving IO port access before NMI re-scheduling. The latter should also help in case of post boot CPU onlining. Although we're running watchdog at much lower frequency at this point, it's neveretheless possible we may trigger the issue anyway. Signed-off-by: Igor Druzhinin <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Index: xen-4.10.0-testing/xen/arch/x86/apic.c =================================================================== --- xen-4.10.0-testing.orig/xen/arch/x86/apic.c +++ xen-4.10.0-testing/xen/arch/x86/apic.c @@ -682,7 +682,7 @@ void setup_local_APIC(void) printk("Leaving ESR disabled.\n"); } - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id()) setup_apic_nmi_watchdog(); apic_pm_activate(); } Index: xen-4.10.0-testing/xen/arch/x86/smpboot.c =================================================================== --- xen-4.10.0-testing.orig/xen/arch/x86/smpboot.c +++ xen-4.10.0-testing/xen/arch/x86/smpboot.c @@ -1241,7 +1241,10 @@ int __cpu_up(unsigned int cpu) void __init smp_cpus_done(void) { if ( nmi_watchdog == NMI_LOCAL_APIC ) + { + setup_apic_nmi_watchdog(); check_nmi_watchdog(); + } setup_ioapic_dest(); Index: xen-4.10.0-testing/xen/arch/x86/traps.c =================================================================== --- xen-4.10.0-testing.orig/xen/arch/x86/traps.c +++ xen-4.10.0-testing/xen/arch/x86/traps.c @@ -1669,7 +1669,7 @@ static nmi_callback_t *nmi_callback = du void do_nmi(const struct cpu_user_regs *regs) { unsigned int cpu = smp_processor_id(); - unsigned char reason; + unsigned char reason = 0; bool handle_unknown = false; ++nmi_count(cpu); @@ -1677,6 +1677,16 @@ void do_nmi(const struct cpu_user_regs * if ( nmi_callback(regs, cpu) ) return; + /* + * Accessing port 0x61 may trap to SMM which has been actually + * observed on some production SKX servers. This SMI sometimes + * takes enough time for the next NMI tick to happen. By reading + * this port before we re-arm the NMI watchdog, we reduce the chance + * of having an NMI watchdog expire while in the SMI handler. + */ + if ( cpu == 0 ) + reason = inb(0x61); + if ( (nmi_watchdog == NMI_NONE) || (!nmi_watchdog_tick(regs) && watchdog_force) ) handle_unknown = true; @@ -1684,7 +1694,6 @@ void do_nmi(const struct cpu_user_regs * /* Only the BSP gets external NMIs from the system. */ if ( cpu == 0 ) { - reason = inb(0x61); if ( reason & 0x80 ) pci_serr_error(regs); if ( reason & 0x40 ) ++++++ 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch ++++++ # Commit cc0e45db277922b5723a7b1d9657d6f744230cf1 # Date 2018-02-27 10:47:23 +0000 # Author Andrew Cooper <[email protected]> # Committer Andrew Cooper <[email protected]> x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in MSR_TSC_AUX, irrespective of whether the relevant CPUID features are advertised/hidden. At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode. Therefore, default PV guests can read the value from a previously scheduled HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest. Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case. To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy update of the MSR, and use this function consistently across the codebase. Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Roger Pau Monné <[email protected]> Reviewed-by: Wei Liu <[email protected]> Acked-by: Jan Beulich <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1531,9 +1531,9 @@ void paravirt_ctxt_switch_to(struct vcpu if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) ) activate_debugregs(v); - if ( (v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP) && - boot_cpu_has(X86_FEATURE_RDTSCP) ) - write_rdtscp_aux(v->domain->arch.incarnation); + if ( cpu_has_rdtscp ) + wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP + ? v->domain->arch.incarnation : 0); } /* Update per-VCPU guest runstate shared memory area (if registered). */ --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3582,7 +3582,7 @@ int hvm_msr_write_intercept(unsigned int v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content; if ( cpu_has_rdtscp && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) ) - wrmsrl(MSR_TSC_AUX, (uint32_t)msr_content); + wrmsr_tsc_aux(msr_content); break; case MSR_IA32_APICBASE: --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1077,7 +1077,7 @@ static void svm_ctxt_switch_to(struct vc svm_tsc_ratio_load(v); if ( cpu_has_rdtscp ) - wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v)); + wrmsr_tsc_aux(hvm_msr_tsc_aux(v)); } static void noreturn svm_do_resume(struct vcpu *v) --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -622,7 +622,7 @@ static void vmx_restore_guest_msrs(struc } if ( cpu_has_rdtscp ) - wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v)); + wrmsr_tsc_aux(hvm_msr_tsc_aux(v)); } void vmx_update_cpu_exec_control(struct vcpu *v) --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -24,6 +24,8 @@ #include <xen/sched.h> #include <asm/msr.h> +DEFINE_PER_CPU(uint32_t, tsc_aux); + struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy, __read_mostly pv_max_msr_domain_policy; --- a/xen/include/asm-x86/msr.h +++ b/xen/include/asm-x86/msr.h @@ -115,8 +115,6 @@ static inline uint64_t rdtsc_ordered(voi __write_tsc(val); \ }) -#define write_rdtscp_aux(val) wrmsr(MSR_TSC_AUX, (val), 0) - #define rdpmc(counter,low,high) \ __asm__ __volatile__("rdpmc" \ : "=a" (low), "=d" (high) \ @@ -202,6 +200,20 @@ void write_efer(u64 val); DECLARE_PER_CPU(u32, ler_msr); +DECLARE_PER_CPU(uint32_t, tsc_aux); + +/* Lazy update of MSR_TSC_AUX */ +static inline void wrmsr_tsc_aux(uint32_t val) +{ + uint32_t *this_tsc_aux = &this_cpu(tsc_aux); + + if ( *this_tsc_aux != val ) + { + wrmsr(MSR_TSC_AUX, val, 0); + *this_tsc_aux = val; + } +} + /* MSR policy object for shared per-domain MSRs */ struct msr_domain_policy { ++++++ 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch ++++++ # Commit d798a0952903db9d8ee0a580e03f214d2b49b7d7 # Date 2018-02-27 14:03:27 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> memory: don't implicitly unpin for decrease-reservation It very likely was a mistake (copy-and-paste from domain cleanup code) to implicitly unpin here: The caller should really unpin itself before (or after, if they so wish) requesting the page to be removed. This is XSA-252. Reported-by: Jann Horn <[email protected]> Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Andrew Cooper <[email protected]> --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -357,11 +357,6 @@ int guest_remove_page(struct domain *d, rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0); -#ifdef _PGT_pinned - if ( !rc && test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) ) - put_page_and_type(page); -#endif - /* * With the lack of an IOMMU on some platforms, domains with DMA-capable * device must retrieve the same pfn when the hypercall populate_physmap ++++++ 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch ++++++ # Commit 9d2f8f9c65d4da35437f50ed9e812a2c5ab313e2 # Date 2018-02-27 14:04:44 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> gnttab/ARM: don't corrupt shared GFN array ... by writing status GFNs to it. Introduce a second array instead. Also implement gnttab_status_gmfn() properly now that the information is suitably being tracked. While touching it anyway, remove a misguided (but luckily benign) upper bound check from gnttab_shared_gmfn(): We should never access beyond the bounds of that array. This is part of XSA-255. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Reviewed-by: Andrew Cooper <[email protected]> --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -3777,6 +3777,7 @@ int gnttab_map_frame(struct domain *d, u { int rc = 0; struct grant_table *gt = d->grant_table; + bool status = false; grant_write_lock(gt); @@ -3787,6 +3788,7 @@ int gnttab_map_frame(struct domain *d, u (idx & XENMAPIDX_grant_table_status) ) { idx &= ~XENMAPIDX_grant_table_status; + status = true; if ( idx < nr_status_frames(gt) ) *mfn = _mfn(virt_to_mfn(gt->status[idx])); else @@ -3804,7 +3806,7 @@ int gnttab_map_frame(struct domain *d, u } if ( !rc ) - gnttab_set_frame_gfn(gt, idx, gfn); + gnttab_set_frame_gfn(gt, status, idx, gfn); grant_write_unlock(gt); --- a/xen/include/asm-arm/grant_table.h +++ b/xen/include/asm-arm/grant_table.h @@ -9,7 +9,8 @@ #define INITIAL_NR_GRANT_FRAMES 1U struct grant_table_arch { - gfn_t *gfn; + gfn_t *shared_gfn; + gfn_t *status_gfn; }; void gnttab_clear_flag(unsigned long nr, uint16_t *addr); @@ -21,7 +22,6 @@ int replace_grant_host_mapping(unsigned unsigned long new_gpaddr, unsigned int flags); void gnttab_mark_dirty(struct domain *d, unsigned long l); #define gnttab_create_status_page(d, t, i) do {} while (0) -#define gnttab_status_gmfn(d, t, i) (0) #define gnttab_release_host_mappings(domain) 1 static inline int replace_grant_supported(void) { @@ -42,19 +42,35 @@ static inline unsigned int gnttab_dom0_m #define gnttab_init_arch(gt) \ ({ \ - (gt)->arch.gfn = xzalloc_array(gfn_t, (gt)->max_grant_frames); \ - ( (gt)->arch.gfn ? 0 : -ENOMEM ); \ + unsigned int ngf_ = (gt)->max_grant_frames; \ + unsigned int nsf_ = grant_to_status_frames(ngf_); \ + \ + (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_); \ + (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_); \ + if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn ) \ + { \ + while ( ngf_-- ) \ + (gt)->arch.shared_gfn[ngf_] = INVALID_GFN; \ + while ( nsf_-- ) \ + (gt)->arch.status_gfn[nsf_] = INVALID_GFN; \ + } \ + else \ + gnttab_destroy_arch(gt); \ + (gt)->arch.shared_gfn ? 0 : -ENOMEM; \ }) #define gnttab_destroy_arch(gt) \ do { \ - xfree((gt)->arch.gfn); \ - (gt)->arch.gfn = NULL; \ + xfree((gt)->arch.shared_gfn); \ + (gt)->arch.shared_gfn = NULL; \ + xfree((gt)->arch.status_gfn); \ + (gt)->arch.status_gfn = NULL; \ } while ( 0 ) -#define gnttab_set_frame_gfn(gt, idx, gfn) \ +#define gnttab_set_frame_gfn(gt, st, idx, gfn) \ do { \ - (gt)->arch.gfn[idx] = gfn; \ + ((st) ? (gt)->arch.status_gfn : (gt)->arch.shared_gfn)[idx] = \ + (gfn); \ } while ( 0 ) #define gnttab_create_shared_page(d, t, i) \ @@ -65,8 +81,10 @@ static inline unsigned int gnttab_dom0_m } while ( 0 ) #define gnttab_shared_gmfn(d, t, i) \ - ( ((i >= nr_grant_frames(t)) && \ - (i < (t)->max_grant_frames))? 0 : gfn_x((t)->arch.gfn[i])) + gfn_x(((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i]) + +#define gnttab_status_gmfn(d, t, i) \ + gfn_x(((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i]) #define gnttab_need_iommu_mapping(d) \ (is_domain_direct_mapped(d) && need_iommu(d)) --- a/xen/include/asm-x86/grant_table.h +++ b/xen/include/asm-x86/grant_table.h @@ -46,7 +46,7 @@ static inline unsigned int gnttab_dom0_m #define gnttab_init_arch(gt) 0 #define gnttab_destroy_arch(gt) do {} while ( 0 ) -#define gnttab_set_frame_gfn(gt, idx, gfn) do {} while ( 0 ) +#define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 ) #define gnttab_create_shared_page(d, t, i) \ do { \ ++++++ 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch ++++++ # Commit 38bfcc165dda5f4284d7c218b91df9e144ddd88d # Date 2018-02-27 14:07:12 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> gnttab: don't blindly free status pages upon version change There may still be active mappings, which would trigger the respective BUG_ON(). Split the loop into one dealing with the page attributes and the second (when the first fully passed) freeing the pages. Return an error if any pages still have pending references. This is part of XSA-255. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Reviewed-by: Andrew Cooper <[email protected]> --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1644,23 +1644,74 @@ status_alloc_failed: return -ENOMEM; } -static void +static int gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt) { - int i; + unsigned int i; for ( i = 0; i < nr_status_frames(gt); i++ ) { struct page_info *pg = virt_to_page(gt->status[i]); + gfn_t gfn = gnttab_get_frame_gfn(gt, true, i); + + /* + * For translated domains, recovering from failure after partial + * changes were made is more complicated than it seems worth + * implementing at this time. Hence respective error paths below + * crash the domain in such a case. + */ + if ( paging_mode_translate(d) ) + { + int rc = gfn_eq(gfn, INVALID_GFN) + ? 0 + : guest_physmap_remove_page(d, gfn, + _mfn(page_to_mfn(pg)), 0); + + if ( rc ) + { + gprintk(XENLOG_ERR, + "Could not remove status frame %u (GFN %#lx) from P2M\n", + i, gfn_x(gfn)); + domain_crash(d); + return rc; + } + gnttab_set_frame_gfn(gt, true, i, INVALID_GFN); + } BUG_ON(page_get_owner(pg) != d); if ( test_and_clear_bit(_PGC_allocated, &pg->count_info) ) put_page(pg); - BUG_ON(pg->count_info & ~PGC_xen_heap); + + if ( pg->count_info & ~PGC_xen_heap ) + { + if ( paging_mode_translate(d) ) + { + gprintk(XENLOG_ERR, + "Wrong page state %#lx of status frame %u (GFN %#lx)\n", + pg->count_info, i, gfn_x(gfn)); + domain_crash(d); + } + else + { + if ( get_page(pg, d) ) + set_bit(_PGC_allocated, &pg->count_info); + while ( i-- ) + gnttab_create_status_page(d, gt, i); + } + return -EBUSY; + } + + page_set_owner(pg, NULL); + } + + for ( i = 0; i < nr_status_frames(gt); i++ ) + { free_xenheap_page(gt->status[i]); gt->status[i] = NULL; } gt->nr_status_frames = 0; + + return 0; } /* @@ -2970,8 +3021,9 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA break; } - if ( op.version < 2 && gt->gt_version == 2 ) - gnttab_unpopulate_status_frames(currd, gt); + if ( op.version < 2 && gt->gt_version == 2 && + (res = gnttab_unpopulate_status_frames(currd, gt)) != 0 ) + goto out_unlock; /* Make sure there's no crud left over from the old version. */ for ( i = 0; i < nr_grant_frames(gt); i++ ) @@ -3805,6 +3857,11 @@ int gnttab_map_frame(struct domain *d, u rc = -EINVAL; } + if ( !rc && paging_mode_translate(d) && + !gfn_eq(gnttab_get_frame_gfn(gt, status, idx), INVALID_GFN) ) + rc = guest_physmap_remove_page(d, gnttab_get_frame_gfn(gt, status, idx), + *mfn, 0); + if ( !rc ) gnttab_set_frame_gfn(gt, status, idx, gfn); --- a/xen/include/asm-arm/grant_table.h +++ b/xen/include/asm-arm/grant_table.h @@ -73,6 +73,11 @@ static inline unsigned int gnttab_dom0_m (gfn); \ } while ( 0 ) +#define gnttab_get_frame_gfn(gt, st, idx) ({ \ + _gfn((st) ? gnttab_status_gmfn(NULL, gt, idx) \ + : gnttab_shared_gmfn(NULL, gt, idx)); \ +}) + #define gnttab_create_shared_page(d, t, i) \ do { \ share_xen_page_with_guest( \ --- a/xen/include/asm-x86/grant_table.h +++ b/xen/include/asm-x86/grant_table.h @@ -47,6 +47,12 @@ static inline unsigned int gnttab_dom0_m #define gnttab_init_arch(gt) 0 #define gnttab_destroy_arch(gt) do {} while ( 0 ) #define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 ) +#define gnttab_get_frame_gfn(gt, st, idx) ({ \ + unsigned long mfn_ = (st) ? gnttab_status_mfn(gt, idx) \ + : gnttab_shared_mfn(gt, idx); \ + unsigned long gpfn_ = get_gpfn_from_mfn(mfn_); \ + VALID_M2P(gpfn_) ? _gfn(gpfn_) : INVALID_GFN; \ +}) #define gnttab_create_shared_page(d, t, i) \ do { \ @@ -63,11 +69,11 @@ static inline unsigned int gnttab_dom0_m } while ( 0 ) -#define gnttab_shared_mfn(d, t, i) \ +#define gnttab_shared_mfn(t, i) \ ((virt_to_maddr((t)->shared_raw[i]) >> PAGE_SHIFT)) #define gnttab_shared_gmfn(d, t, i) \ - (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i))) + (mfn_to_gmfn(d, gnttab_shared_mfn(t, i))) #define gnttab_status_mfn(t, i) \ ++++++ 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch ++++++ # Commit 0aa6158b674c5d083b75ac8fcd1e7ae92d0c39ae # Date 2018-02-27 14:08:36 +0100 # Author Andrew Cooper <[email protected]> # Committer Jan Beulich <[email protected]> x86/hvm: Disallow the creation of HVM domains without Local APIC emulation There are multiple problems, not necesserily limited to: * Guests which configure event channels via hvmop_set_evtchn_upcall_vector(), or which hit %cr8 emulation will cause Xen to fall over a NULL vlapic->regs pointer. * On Intel hardware, disabling the TPR_SHADOW execution control without reenabling CR8_{LOAD,STORE} interception means that the guests %cr8 accesses interact with the real TPR. Amongst other things, setting the real TPR to 0xf blocks even IPIs from interrupting this CPU. * On hardware which sets up the use of Interrupt Posting, including IOMMU-Posting, guests run without the appropriate non-root configuration, which at a minimum will result in dropped interrupts. Whether no-LAPIC mode is of any use at all remains to be seen. This is XSA-256. Reported-by: Ian Jackson <[email protected]> Signed-off-by: Andrew Cooper <[email protected]> Reviewed-by: Roger Pau Monné <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -413,7 +413,7 @@ static bool emulation_flags_ok(const str if ( is_hardware_domain(d) && emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) ) return false; - if ( !is_hardware_domain(d) && emflags && + if ( !is_hardware_domain(d) && emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC ) return false; } ++++++ 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch ++++++ References: bsc#1072834 # Commit 1f1d183d49008794b087cf043fc77f724a45af98 # Date 2018-02-27 15:12:23 +0100 # Author Jan Beulich <[email protected]> # Committer Jan Beulich <[email protected]> x86/HVM: don't give the wrong impression of WRMSR succeeding ... for non-existent MSRs: wrmsr_hypervisor_regs()'s comment clearly says that the function returns 0 for unrecognized MSRs, so {svm,vmx}_msr_write_intercept() should not convert this into success. We don't want to unconditionally fail the access though, as we can't be certain the list of handled MSRs is complete enough for the guest types we care about, so instead mirror what we do on the read paths and probe the MSR to decide whether to raise #GP. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Andrew Cooper <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2106,6 +2106,13 @@ static int svm_msr_write_intercept(unsig result = X86EMUL_RETRY; break; case 0: + /* + * Match up with the RDMSR side for now; ultimately this entire + * case block should go away. + */ + if ( rdmsr_safe(msr, msr_content) == 0 ) + break; + goto gpf; case 1: break; default: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3182,6 +3182,13 @@ static int vmx_msr_write_intercept(unsig case -ERESTART: return X86EMUL_RETRY; case 0: + /* + * Match up with the RDMSR side for now; ultimately this + * entire case block should go away. + */ + if ( rdmsr_safe(msr, msr_content) == 0 ) + break; + goto gp_fault; case 1: break; default:
