Hello community,

here is the log from the commit of package xen for openSUSE:Factory checked in 
at 2018-03-30 12:00:34
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/xen (Old)
 and      /work/SRC/openSUSE:Factory/.xen.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Package is "xen"

Fri Mar 30 12:00:34 2018 rev:245 rq:591751 version:4.10.0_16

Changes:
--------
--- /work/SRC/openSUSE:Factory/xen/xen.changes  2018-03-20 21:50:48.542316318 
+0100
+++ /work/SRC/openSUSE:Factory/.xen.new/xen.changes     2018-03-30 
12:00:43.480265750 +0200
@@ -1,0 +2,12 @@
+Mon Mar 26 08:20:45 MDT 2018 - carn...@suse.com
+
+- Upstream patches from Jan (bsc#1027519) and fixes related to
+  Page Table Isolation (XPTI). See also bsc#1074562 XSA-254
+  5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
+  5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
+  5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
+  5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
+  5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
+  5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch
+
+-------------------------------------------------------------------

New:
----
  5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
  5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
  5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
  5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
  5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
  5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ xen.spec ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old  2018-03-30 12:00:46.148169274 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new  2018-03-30 12:00:46.152169129 +0200
@@ -126,7 +126,7 @@
 BuildRequires:  pesign-obs-integration
 %endif
 
-Version:        4.10.0_14
+Version:        4.10.0_16
 Release:        0
 Summary:        Xen Virtualization: Hypervisor (aka VMM aka Microkernel)
 License:        GPL-2.0
@@ -211,13 +211,19 @@
 Patch48:        
5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
 Patch49:        5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch
 Patch50:        5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch
-Patch51:        5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
-Patch52:        5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
-Patch53:        5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
-Patch54:        5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
-Patch55:        5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
-Patch56:        5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
-Patch57:        
5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
+Patch51:        5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
+Patch52:        5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
+Patch53:        5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
+Patch54:        5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
+Patch55:        5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
+Patch56:        5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
+Patch57:        5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
+Patch58:        
5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
+Patch59:        5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
+Patch60:        5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
+Patch61:        5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
+Patch62:        5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
+Patch63:        5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch
 # Our platform specific patches
 Patch400:       xen-destdir.patch
 Patch401:       vif-bridge-no-iptables.patch
@@ -465,6 +471,12 @@
 %patch55 -p1
 %patch56 -p1
 %patch57 -p1
+%patch58 -p1
+%patch59 -p1
+%patch60 -p1
+%patch61 -p1
+%patch62 -p1
+%patch63 -p1
 # Our platform specific patches
 %patch400 -p1
 %patch401 -p1

++++++ 5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch ++++++
# Commit 422588e88511d17984544c0f017a927de3315290
# Date 2018-02-15 11:08:27 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings

The current XPTI implementation isolates the directmap (and therefore a lot of
guest data), but a large quantity of CPU0's state (including its stack)
remains visible.

Furthermore, an attacker able to read .text is in a vastly superior position
to normal when it comes to fingerprinting Xen for known vulnerabilities, or
scanning for ROP/Spectre gadgets.

Collect together the entrypoints in .text.entry (currently 3x4k frames, but
can almost certainly be slimmed down), and create a common mapping which is
inserted into each per-cpu shadow.  The stubs are also inserted into this
mapping by pointing at the in-use L2.  This allows stubs allocated later (SMP
boot, or CPU hotplug) to work without further changes to the common mappings.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>

# Commit d1d6fc97d66cf56847fc0bcc2ddc370707c22378
# Date 2018-03-06 16:46:27 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86/xpti: really hide almost all of Xen image

Commit 422588e885 ("x86/xpti: Hide almost all of .text and all
.data/.rodata/.bss mappings") carefully limited the Xen image cloning to
just entry code, but then overwrote the just allocated and populated L3
entry with the normal one again covering both Xen image and stubs.

Drop the respective code in favor of an explicit clone_mapping()
invocation. This in turn now requires setup_cpu_root_pgt() to run after
stub setup in all cases. Additionally, with (almost) no unintended
mappings left, the BSP's IDT now also needs to be page aligned.

The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary
for functionality, but things are more logical this way, and we retain
cleanup being done in the inverse order of setup.

Signed-off-by: Jan Beulich <jbeul...@suse.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com>

# Commit 044fedfaa29b5d5774196e3fc7d955a48bfceac4
# Date 2018-03-09 15:42:24 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/traps: Put idt_table[] back into .bss

c/s d1d6fc97d "x86/xpti: really hide almost all of Xen image" accidentially
moved idt_table[] from .bss to .data by virtue of using the page_aligned
section.  We also have .bss.page_aligned, so use that.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Wei Liu <wei.l...@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1897,9 +1897,6 @@ mode.
 Override default selection of whether to isolate 64-bit PV guest page
 tables.
 
-** WARNING: Not yet a complete isolation implementation, but better than
-nothing. **
-
 ### xsave
 > `= <boolean>`
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -644,13 +644,24 @@ static int clone_mapping(const void *ptr
 {
     unsigned long linear = (unsigned long)ptr, pfn;
     unsigned int flags;
-    l3_pgentry_t *pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
-                         l3_table_offset(linear);
+    l3_pgentry_t *pl3e;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
 
-    if ( linear < DIRECTMAP_VIRT_START )
-        return 0;
+    /*
+     * Sanity check 'linear'.  We only allow cloning from the Xen virtual
+     * range, and in particular, only from the directmap and .text ranges.
+     */
+    if ( root_table_offset(linear) > ROOT_PAGETABLE_LAST_XEN_SLOT ||
+         root_table_offset(linear) < ROOT_PAGETABLE_FIRST_XEN_SLOT )
+        return -EINVAL;
+
+    if ( linear < XEN_VIRT_START ||
+         (linear >= XEN_VIRT_END && linear < DIRECTMAP_VIRT_START) )
+        return -EINVAL;
+
+    pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
+        l3_table_offset(linear);
 
     flags = l3e_get_flags(*pl3e);
     ASSERT(flags & _PAGE_PRESENT);
@@ -742,6 +753,10 @@ static __read_mostly int8_t opt_xpti = -
 boolean_param("xpti", opt_xpti);
 DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
 
+static root_pgentry_t common_pgt;
+
+extern const char _stextentry[], _etextentry[];
+
 static int setup_cpu_root_pgt(unsigned int cpu)
 {
     root_pgentry_t *rpt;
@@ -762,8 +777,23 @@ static int setup_cpu_root_pgt(unsigned i
         idle_pg_table[root_table_offset(RO_MPT_VIRT_START)];
     /* SH_LINEAR_PT inserted together with guest mappings. */
     /* PERDOMAIN inserted during context switch. */
-    rpt[root_table_offset(XEN_VIRT_START)] =
-        idle_pg_table[root_table_offset(XEN_VIRT_START)];
+
+    /* One-time setup of common_pgt, which maps .text.entry and the stubs. */
+    if ( unlikely(!root_get_intpte(common_pgt)) )
+    {
+        const char *ptr;
+
+        for ( rc = 0, ptr = _stextentry;
+              !rc && ptr < _etextentry; ptr += PAGE_SIZE )
+            rc = clone_mapping(ptr, rpt);
+
+        if ( rc )
+            return rc;
+
+        common_pgt = rpt[root_table_offset(XEN_VIRT_START)];
+    }
+
+    rpt[root_table_offset(XEN_VIRT_START)] = common_pgt;
 
     /* Install direct map page table entries for stack, IDT, and TSS. */
     for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
@@ -773,6 +803,8 @@ static int setup_cpu_root_pgt(unsigned i
         rc = clone_mapping(idt_tables[cpu], rpt);
     if ( !rc )
         rc = clone_mapping(&per_cpu(init_tss, cpu), rpt);
+    if ( !rc )
+        rc = clone_mapping((void *)per_cpu(stubs.addr, cpu), rpt);
 
     return rc;
 }
@@ -781,6 +813,7 @@ static void cleanup_cpu_root_pgt(unsigne
 {
     root_pgentry_t *rpt = per_cpu(root_pgt, cpu);
     unsigned int r;
+    unsigned long stub_linear = per_cpu(stubs.addr, cpu);
 
     if ( !rpt )
         return;
@@ -825,6 +858,16 @@ static void cleanup_cpu_root_pgt(unsigne
     }
 
     free_xen_pagetable(rpt);
+
+    /* Also zap the stub mapping for this CPU. */
+    if ( stub_linear )
+    {
+        l3_pgentry_t *l3t = l4e_to_l3e(common_pgt);
+        l2_pgentry_t *l2t = l3e_to_l2e(l3t[l3_table_offset(stub_linear)]);
+        l1_pgentry_t *l1t = l2e_to_l1e(l2t[l2_table_offset(stub_linear)]);
+
+        l1t[l2_table_offset(stub_linear)] = l1e_empty();
+    }
 }
 
 static void cpu_smpboot_free(unsigned int cpu)
@@ -848,6 +891,8 @@ static void cpu_smpboot_free(unsigned in
     if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
         free_cpumask_var(per_cpu(scratch_cpumask, cpu));
 
+    cleanup_cpu_root_pgt(cpu);
+
     if ( per_cpu(stubs.addr, cpu) )
     {
         mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
@@ -865,8 +910,6 @@ static void cpu_smpboot_free(unsigned in
             free_domheap_page(mfn_to_page(mfn));
     }
 
-    cleanup_cpu_root_pgt(cpu);
-
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
     free_xenheap_pages(per_cpu(gdt_table, cpu), order);
 
@@ -922,9 +965,6 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
 
-    if ( setup_cpu_root_pgt(cpu) )
-        goto oom;
-
     for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
           i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
         if ( cpu_online(i) && cpu_to_node(i) == node )
@@ -938,6 +978,9 @@ static int cpu_smpboot_alloc(unsigned in
         goto oom;
     per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
 
+    if ( setup_cpu_root_pgt(cpu) )
+        goto oom;
+
     if ( secondary_socket_cpumask == NULL &&
          (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
         goto oom;
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -102,7 +102,8 @@ DEFINE_PER_CPU_READ_MOSTLY(struct desc_s
 DEFINE_PER_CPU_READ_MOSTLY(struct desc_struct *, compat_gdt_table);
 
 /* Master table, used by CPU0. */
-idt_entry_t idt_table[IDT_ENTRIES];
+idt_entry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
+    idt_table[IDT_ENTRIES];
 
 /* Pointer to the IDT of every CPU. */
 idt_entry_t *idt_tables[NR_CPUS] __read_mostly;
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -13,6 +13,8 @@
 #include <public/xen.h>
 #include <irq_vectors.h>
 
+        .section .text.entry, "ax", @progbits
+
 ENTRY(entry_int82)
         ASM_CLAC
         pushq $0
@@ -270,6 +272,9 @@ ENTRY(compat_int80_direct_trap)
         call  compat_create_bounce_frame
         jmp   compat_test_all_events
 
+        /* compat_create_bounce_frame & helpers don't need to be in 
.text.entry */
+        .text
+
 /* CREATE A BASIC EXCEPTION FRAME ON GUEST OS (RING-1) STACK:            */
 /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
 /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -14,6 +14,8 @@
 #include <public/xen.h>
 #include <irq_vectors.h>
 
+        .section .text.entry, "ax", @progbits
+
 /* %rbx: struct vcpu */
 ENTRY(switch_to_kernel)
         leaq  VCPU_trap_bounce(%rbx),%rdx
@@ -357,6 +359,9 @@ int80_slow_path:
         subq  $2,UREGS_rip(%rsp)
         jmp   handle_exception_saved
 
+        /* create_bounce_frame & helpers don't need to be in .text.entry */
+        .text
+
 /* CREATE A BASIC EXCEPTION FRAME ON GUEST OS STACK:                     */
 /*   { RCX, R11, [ERRCODE,] RIP, CS, RFLAGS, RSP, SS }                   */
 /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
@@ -487,6 +492,8 @@ ENTRY(dom_crash_sync_extable)
         jmp   asm_domain_crash_synchronous /* Does not return */
         .popsection
 
+        .section .text.entry, "ax", @progbits
+
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
 
@@ -846,8 +853,7 @@ GLOBAL(trap_nop)
 
 
 
-.section .rodata, "a", @progbits
-
+        .pushsection .rodata, "a", @progbits
 ENTRY(exception_table)
         .quad do_trap
         .quad do_debug
@@ -873,9 +879,10 @@ ENTRY(exception_table)
         .quad do_reserved_trap /* Architecturally reserved exceptions. */
         .endr
         .size exception_table, . - exception_table
+        .popsection
 
 /* Table of automatically generated entry points.  One per vector. */
-        .section .init.rodata, "a", @progbits
+        .pushsection .init.rodata, "a", @progbits
 GLOBAL(autogen_entrypoints)
         /* pop into the .init.rodata section and record an entry point. */
         .macro entrypoint ent
@@ -884,7 +891,7 @@ GLOBAL(autogen_entrypoints)
         .popsection
         .endm
 
-        .text
+        .popsection
 autogen_stubs: /* Automatically generated stubs. */
 
         vec = 0
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -60,6 +60,13 @@ SECTIONS
         _stext = .;            /* Text and read-only data */
        *(.text)
        *(.text.__x86_indirect_thunk_*)
+
+       . = ALIGN(PAGE_SIZE);
+       _stextentry = .;
+       *(.text.entry)
+       . = ALIGN(PAGE_SIZE);
+       _etextentry = .;
+
        *(.text.cold)
        *(.text.unlikely)
        *(.fixup)
++++++ 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old  2018-03-30 12:00:46.384160741 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new  2018-03-30 12:00:46.384160741 +0200
@@ -28,10 +28,8 @@
 Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
 Reviewed-by: Jan Beulich <jbeul...@suse.com>
 
-Index: xen-4.10.0-testing/xen/arch/x86/apic.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/apic.c
-+++ xen-4.10.0-testing/xen/arch/x86/apic.c
+--- a/xen/arch/x86/apic.c
++++ b/xen/arch/x86/apic.c
 @@ -682,7 +682,7 @@ void setup_local_APIC(void)
          printk("Leaving ESR disabled.\n");
      }
@@ -41,11 +39,9 @@
          setup_apic_nmi_watchdog();
      apic_pm_activate();
  }
-Index: xen-4.10.0-testing/xen/arch/x86/smpboot.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/smpboot.c
-+++ xen-4.10.0-testing/xen/arch/x86/smpboot.c
-@@ -1241,7 +1241,10 @@ int __cpu_up(unsigned int cpu)
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -1284,7 +1284,10 @@ int __cpu_up(unsigned int cpu)
  void __init smp_cpus_done(void)
  {
      if ( nmi_watchdog == NMI_LOCAL_APIC )
@@ -56,11 +52,9 @@
  
      setup_ioapic_dest();
  
-Index: xen-4.10.0-testing/xen/arch/x86/traps.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/traps.c
-+++ xen-4.10.0-testing/xen/arch/x86/traps.c
-@@ -1669,7 +1669,7 @@ static nmi_callback_t *nmi_callback = du
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -1670,7 +1670,7 @@ static nmi_callback_t *nmi_callback = du
  void do_nmi(const struct cpu_user_regs *regs)
  {
      unsigned int cpu = smp_processor_id();
@@ -69,7 +63,7 @@
      bool handle_unknown = false;
  
      ++nmi_count(cpu);
-@@ -1677,6 +1677,16 @@ void do_nmi(const struct cpu_user_regs *
+@@ -1678,6 +1678,16 @@ void do_nmi(const struct cpu_user_regs *
      if ( nmi_callback(regs, cpu) )
          return;
  
@@ -86,7 +80,7 @@
      if ( (nmi_watchdog == NMI_NONE) ||
           (!nmi_watchdog_tick(regs) && watchdog_force) )
          handle_unknown = true;
-@@ -1684,7 +1694,6 @@ void do_nmi(const struct cpu_user_regs *
+@@ -1685,7 +1695,6 @@ void do_nmi(const struct cpu_user_regs *
      /* Only the BSP gets external NMIs from the system. */
      if ( cpu == 0 )
      {

++++++ 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old  2018-03-30 12:00:46.420159439 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new  2018-03-30 12:00:46.424159294 +0200
@@ -19,6 +19,20 @@
 Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
 Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
 
+# Commit 59c0983e10d70ea2368085271b75fb007811fe52
+# Date 2018-03-15 12:44:24 +0100
+# Author Jan Beulich <jbeul...@suse.com>
+# Committer Jan Beulich <jbeul...@suse.com>
+x86: ignore guest microcode loading attempts
+
+The respective MSRs are write-only, and hence attempts by guests to
+write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
+impression of WRMSR succeeding") no longer ignored. Restore original
+behavior for the two affected MSRs.
+
+Signed-off-by: Jan Beulich <jbeul...@suse.com>
+Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
+
 --- a/xen/arch/x86/hvm/svm/svm.c
 +++ b/xen/arch/x86/hvm/svm/svm.c
 @@ -2106,6 +2106,13 @@ static int svm_msr_write_intercept(unsig
@@ -51,3 +65,43 @@
                      case 1:
                          break;
                      default:
+--- a/xen/arch/x86/msr.c
++++ b/xen/arch/x86/msr.c
+@@ -128,6 +128,8 @@ int guest_rdmsr(const struct vcpu *v, ui
+ 
+     switch ( msr )
+     {
++    case MSR_AMD_PATCHLOADER:
++    case MSR_IA32_UCODE_WRITE:
+     case MSR_PRED_CMD:
+         /* Write-only */
+         goto gp_fault;
+@@ -181,6 +183,28 @@ int guest_wrmsr(struct vcpu *v, uint32_t
+         /* Read-only */
+         goto gp_fault;
+ 
++    case MSR_AMD_PATCHLOADER:
++        /*
++         * See note on MSR_IA32_UCODE_WRITE below, which may or may not apply
++         * to AMD CPUs as well (at least the architectural/CPUID part does).
++         */
++        if ( is_pv_domain(d) ||
++             d->arch.cpuid->x86_vendor != X86_VENDOR_AMD )
++            goto gp_fault;
++        break;
++
++    case MSR_IA32_UCODE_WRITE:
++        /*
++         * Some versions of Windows at least on certain hardware try to load
++         * microcode before setting up an IDT. Therefore we must not inject 
#GP
++         * for such attempts. Also the MSR is architectural and not qualified
++         * by any CPUID bit.
++         */
++        if ( is_pv_domain(d) ||
++             d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL )
++            goto gp_fault;
++        break;
++
+     case MSR_SPEC_CTRL:
+         if ( !cp->feat.ibrsb )
+             goto gp_fault; /* MSR available? */

++++++ 5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch ++++++
# Commit d303784b68237ff3050daa184f560179dda21b8c
# Date 2018-03-06 16:46:57 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86/xpti: don't map stack guard pages

Other than for the main mappings, don't even do this in release builds,
as there are no huge page shattering concerns here.

Note that since we don't run on the restructed page tables while HVM
guests execute, the non-present mappings won't trigger the triple fault
issue AMD SVM is susceptible to with our current placement of STGI vs
TR loading.

Signed-off-by: Jan Beulich <jbeul...@suse.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5538,6 +5538,14 @@ void memguard_unguard_stack(void *p)
     memguard_unguard_range(p, PAGE_SIZE);
 }
 
+bool memguard_is_stack_guard_page(unsigned long addr)
+{
+    addr &= STACK_SIZE - 1;
+
+    return addr >= STACK_SIZE - PRIMARY_STACK_SIZE - PAGE_SIZE &&
+           addr < STACK_SIZE - PRIMARY_STACK_SIZE;
+}
+
 void arch_dump_shared_mem_info(void)
 {
     printk("Shared frames %u -- Saved frames %u\n",
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -797,7 +797,8 @@ static int setup_cpu_root_pgt(unsigned i
 
     /* Install direct map page table entries for stack, IDT, and TSS. */
     for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
-        rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
+        if ( !memguard_is_stack_guard_page(off) )
+            rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
 
     if ( !rc )
         rc = clone_mapping(idt_tables[cpu], rpt);
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -519,6 +519,7 @@ void memguard_unguard_range(void *p, uns
 
 void memguard_guard_stack(void *p);
 void memguard_unguard_stack(void *p);
+bool __attribute_const__ memguard_is_stack_guard_page(unsigned long addr);
 
 struct mmio_ro_emulate_ctxt {
         unsigned long cr2;
++++++ 5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch ++++++
# Commit 9d1d31ad9498e6ceb285d5774e34fed5f648c273
# Date 2018-03-06 16:48:44 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86: slightly reduce Meltdown band-aid overhead

I'm not sure why I didn't do this right away: By avoiding the use of
global PTEs in the cloned directmap, there's no need to fiddle with
CR4.PGE on any of the entry paths. Only the exit paths need to flush
global mappings.

The reduced flushing, however, requires that we now have interrupts off
on all entry paths until after the page table switch, so that flush IPIs
can't be serviced while on the restricted pagetables, leaving a window
where a potentially stale guest global mapping can be brought into the
TLB. Along those lines the "sync" IPI after L4 entry updates now needs
to become a real (and global) flush IPI, so that inside Xen we'll also
pick up such changes.

Signed-off-by: Jan Beulich <jbeul...@suse.com>
Tested-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>

# Commit c4dd58f0cf23cdf119bbccedfb8c24435fc6f3ab
# Date 2018-03-16 17:27:36 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86: correct EFLAGS.IF in SYSENTER frame

Commit 9d1d31ad94 ("x86: slightly reduce Meltdown band-aid overhead")
moved the STI past the PUSHF. While this isn't an active problem (as we
force EFLAGS.IF to 1 before exiting to guest context), let's not risk
internal confusion by finding a PV guest frame with interrupts
apparently off.

Signed-off-by: Jan Beulich <jbeul...@suse.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3782,18 +3782,14 @@ long do_mmu_update(
     {
         /*
          * Force other vCPU-s of the affected guest to pick up L4 entry
-         * changes (if any). Issue a flush IPI with empty operation mask to
-         * facilitate this (including ourselves waiting for the IPI to
-         * actually have arrived). Utilize the fact that FLUSH_VA_VALID is
-         * meaningless without FLUSH_CACHE, but will allow to pass the no-op
-         * check in flush_area_mask().
+         * changes (if any).
          */
         unsigned int cpu = smp_processor_id();
         cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
 
         cpumask_andnot(mask, pt_owner->domain_dirty_cpumask, cpumask_of(cpu));
         if ( !cpumask_empty(mask) )
-            flush_area_mask(mask, ZERO_BLOCK_PTR, FLUSH_VA_VALID);
+            flush_mask(mask, FLUSH_TLB_GLOBAL);
     }
 
     perfc_add(num_page_updates, i);
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -737,6 +737,7 @@ static int clone_mapping(const void *ptr
     }
 
     pl1e += l1_table_offset(linear);
+    flags &= ~_PAGE_GLOBAL;
 
     if ( l1e_get_flags(*pl1e) & _PAGE_PRESENT )
     {
@@ -1046,8 +1047,17 @@ void __init smp_prepare_cpus(unsigned in
     if ( rc )
         panic("Error %d setting up PV root page table\n", rc);
     if ( per_cpu(root_pgt, 0) )
+    {
         get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
 
+        /*
+         * All entry points which may need to switch page tables have to start
+         * with interrupts off. Re-write what pv_trap_init() has put there.
+         */
+        _set_gate(idt_table + LEGACY_SYSCALL_VECTOR, SYS_DESC_irq_gate, 3,
+                  &int80_direct_trap);
+    }
+
     set_nr_sockets();
 
     socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets);
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -202,7 +202,7 @@ ENTRY(compat_post_handle_exception)
 
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
-        sti
+        /* sti could live here when we don't switch page tables below. */
         CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
@@ -222,11 +222,12 @@ ENTRY(cstar_enter)
         jz    .Lcstar_cr3_okay
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
         neg   %rcx
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lcstar_cr3_okay:
+        sti
 
-        GET_CURRENT(bx)
+        __GET_CURRENT(bx)
         movq  VCPU_domain(%rbx),%rcx
         cmpb  $0,DOMAIN_is_32bit_pv(%rcx)
         je    switch_to_kernel
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -150,7 +150,7 @@ UNLIKELY_END(exit_cr3)
  * %ss must be saved into the space left by the trampoline.
  */
 ENTRY(lstar_enter)
-        sti
+        /* sti could live here when we don't switch page tables below. */
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
         pushq %r11
@@ -169,9 +169,10 @@ ENTRY(lstar_enter)
         jz    .Llstar_cr3_okay
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
         neg   %rcx
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Llstar_cr3_okay:
+        sti
 
         __GET_CURRENT(bx)
         testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
@@ -254,7 +255,7 @@ process_trap:
         jmp  test_all_events
 
 ENTRY(sysenter_entry)
-        sti
+        /* sti could live here when we don't switch page tables below. */
         pushq $FLAT_USER_SS
         pushq $0
         pushfq
@@ -270,14 +271,17 @@ GLOBAL(sysenter_eflags_saved)
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
+        /* PUSHF above has saved EFLAGS.IF clear (the caller had it set). */
+        orl   $X86_EFLAGS_IF, UREGS_eflags(%rsp)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
         neg   %rcx
         jz    .Lsyse_cr3_okay
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
         neg   %rcx
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lsyse_cr3_okay:
+        sti
 
         __GET_CURRENT(bx)
         cmpb  $0,VCPU_sysenter_disables_events(%rbx)
@@ -324,9 +328,10 @@ ENTRY(int80_direct_trap)
         jz    .Lint80_cr3_okay
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
         neg   %rcx
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lint80_cr3_okay:
+        sti
 
         cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
@@ -510,7 +515,7 @@ ENTRY(common_interrupt)
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         neg   %rcx
 .Lintr_cr3_load:
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         xor   %ecx, %ecx
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3, UREGS_cs(%rsp)
@@ -552,7 +557,7 @@ GLOBAL(handle_exception)
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         neg   %rcx
 .Lxcpt_cr3_load:
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         xor   %ecx, %ecx
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3, UREGS_cs(%rsp)
@@ -748,7 +753,7 @@ ENTRY(double_fault)
         jns   .Ldblf_cr3_load
         neg   %rbx
 .Ldblf_cr3_load:
-        write_cr3 rbx, rdi, rsi
+        mov   %rbx, %cr3
 .Ldblf_cr3_okay:
 
         movq  %rsp,%rdi
@@ -783,7 +788,7 @@ handle_ist_exception:
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         neg   %rcx
 .List_cr3_load:
-        write_cr3 rcx, rdi, rsi
+        mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
 .List_cr3_okay:
 
++++++ 5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch ++++++
# Commit 31bf55cb5fe3796cf6a4efbcfc0a9418bb1c783f
# Date 2018-03-06 16:49:36 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86: remove CR reads from exit-to-guest path

CR3 is - during normal operation - only ever loaded from v->arch.cr3,
so there's no need to read the actual control register. For CR4 we can
generally use the cached value on all synchronous entry end exit paths.
Drop the write_cr3 macro, as the two use sites are probably easier to
follow without its use.

Signed-off-by: Jan Beulich <jbeul...@suse.com>
Tested-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>

--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -88,6 +88,7 @@ void __dummy__(void)
     OFFSET(VCPU_kernel_ss, struct vcpu, arch.pv_vcpu.kernel_ss);
     OFFSET(VCPU_iopl, struct vcpu, arch.pv_vcpu.iopl);
     OFFSET(VCPU_guest_context_flags, struct vcpu, arch.vgc_flags);
+    OFFSET(VCPU_cr3, struct vcpu, arch.cr3);
     OFFSET(VCPU_arch_msr, struct vcpu, arch.msr);
     OFFSET(VCPU_nmi_pending, struct vcpu, nmi_pending);
     OFFSET(VCPU_mce_pending, struct vcpu, mce_pending);
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -45,7 +45,7 @@ restore_all_guest:
         mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
 
         /* Copy guest mappings and switch to per-CPU root page table. */
-        mov   %cr3, %r9
+        mov   VCPU_cr3(%rbx), %r9
         GET_STACK_END(dx)
         mov   STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
         movabs $PADDR_MASK & PAGE_MASK, %rsi
@@ -67,8 +67,13 @@ restore_all_guest:
         sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
                 ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
         rep movsq
+        mov   STACK_CPUINFO_FIELD(cr4)(%rdx), %rdi
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
-        write_cr3 rax, rdi, rsi
+        mov   %rdi, %rsi
+        and   $~X86_CR4_PGE, %rdi
+        mov   %rdi, %cr4
+        mov   %rax, %cr3
+        mov   %rsi, %cr4
 .Lrag_keep_cr3:
 
         /* Restore stashed SPEC_CTRL value. */
@@ -124,7 +129,12 @@ restore_all_xen:
          * so "g" will have to do.
          */
 UNLIKELY_START(g, exit_cr3)
-        write_cr3 rax, rdi, rsi
+        mov   %cr4, %rdi
+        mov   %rdi, %rsi
+        and   $~X86_CR4_PGE, %rdi
+        mov   %rdi, %cr4
+        mov   %rax, %cr3
+        mov   %rsi, %cr4
 UNLIKELY_END(exit_cr3)
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -207,15 +207,6 @@ void ret_from_intr(void);
 #define ASM_STAC ASM_AC(STAC)
 #define ASM_CLAC ASM_AC(CLAC)
 
-.macro write_cr3 val:req, tmp1:req, tmp2:req
-        mov   %cr4, %\tmp1
-        mov   %\tmp1, %\tmp2
-        and   $~X86_CR4_PGE, %\tmp1
-        mov   %\tmp1, %cr4
-        mov   %\val, %cr3
-        mov   %\tmp2, %cr4
-.endm
-
 #define CR4_PV32_RESTORE                                           \
         667: ASM_NOP5;                                             \
         .pushsection .altinstr_replacement, "ax";                  \
++++++ 5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch ++++++
# Commit 185413355fe331cbc926d48568838227234c9a20
# Date 2018-03-09 17:30:49 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
cpufreq/ondemand: fix race while offlining CPU

Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.

Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).

Also adjust coding style in cpufreq_dbs_timer_resume().

Reported-by: Martin Cerveny <mar...@c-home.cz>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Tested-by: Martin Cerveny <mar...@c-home.cz>
Reviewed-by: Wei Liu <wei.l...@citrix.com>

--- a/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -204,7 +204,14 @@ static void dbs_timer_init(struct cpu_db
 static void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
 {
     dbs_info->enable = 0;
-    dbs_info->stoppable = 0;
+
+    /*
+     * The timer function may be running (from cpufreq_dbs_timer_resume) -
+     * wait for it to complete.
+     */
+    while ( cmpxchg(&dbs_info->stoppable, 1, 0) < 0 )
+        cpu_relax();
+
     kill_timer(&per_cpu(dbs_timer, dbs_info->cpu));
 }
 
@@ -369,23 +376,22 @@ void cpufreq_dbs_timer_suspend(void)
 
 void cpufreq_dbs_timer_resume(void)
 {
-    int cpu;
-    struct timer* t;
-    s_time_t now;
-
-    cpu = smp_processor_id();
+    unsigned int cpu = smp_processor_id();
+    int8_t *stoppable = &per_cpu(cpu_dbs_info, cpu).stoppable;
 
-    if ( per_cpu(cpu_dbs_info,cpu).stoppable )
+    if ( *stoppable )
     {
-        now = NOW();
-        t = &per_cpu(dbs_timer, cpu);
-        if (t->expires <= now)
+        s_time_t now = NOW();
+        struct timer *t = &per_cpu(dbs_timer, cpu);
+
+        if ( t->expires <= now )
         {
+            if ( !cmpxchg(stoppable, 1, -1) )
+                return;
             t->function(t->data);
+            (void)cmpxchg(stoppable, -1, 1);
         }
         else
-        {
-            set_timer(t, align_timer(now , dbs_tuners_ins.sampling_rate));
-        }
+            set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
     }
 }
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -225,8 +225,8 @@ struct cpu_dbs_info_s {
     struct cpufreq_frequency_table *freq_table;
     int cpu;
     unsigned int enable:1;
-    unsigned int stoppable:1;
     unsigned int turbo_enabled:1;
+    int8_t stoppable;
 };
 
 int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event);
++++++ 5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch ++++++
# Commit 12a50030a81a14a3c7be672ddfde707b961479ec
# Date 2018-03-15 16:59:52 +0100
# Author Liran Alon <liran.a...@oracle.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR

According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."

Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").

Behavior after this patch is aligned with both Intel SDM and KVM
implementation.

Signed-off-by: Liran Alon <liran.a...@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>

--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -161,6 +161,8 @@ void vlapic_set_irq(struct vlapic *vlapi
 
     if ( trig )
         vlapic_set_vector(vec, &vlapic->regs->data[APIC_TMR]);
+    else
+        vlapic_clear_vector(vec, &vlapic->regs->data[APIC_TMR]);
 
     if ( hvm_funcs.update_eoi_exit_bitmap )
         hvm_funcs.update_eoi_exit_bitmap(target, vec, trig);
@@ -434,7 +436,7 @@ void vlapic_handle_EOI(struct vlapic *vl
 {
     struct domain *d = vlapic_domain(vlapic);
 
-    if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
+    if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(d, vector);
 
     hvm_dpci_msi_eoi(d, vector);

Reply via email to