commit xen for openSUSE:Leap:15.2

root Tue, 21 Apr 2020 10:08:26 -0700

Hello community,

here is the log from the commit of package xen for openSUSE:Leap:15.2 checked 
in at 2020-04-21 19:06:36
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Leap:15.2/xen (Old)
 and      /work/SRC/openSUSE:Leap:15.2/.xen.new.2738 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "xen"

Tue Apr 21 19:06:36 2020 rev:85 rq:795132 version:4.13.0_12

Changes:
--------
--- /work/SRC/openSUSE:Leap:15.2/xen/xen.changes        2020-03-31 
07:21:35.758377870 +0200
+++ /work/SRC/openSUSE:Leap:15.2/.xen.new.2738/xen.changes      2020-04-21 
19:08:03.992055245 +0200
@@ -1,0 +2,44 @@
+Tue Apr 14 11:06:08 MDT 2020 - carn...@suse.com
+
+- bsc#1169392 - VUL-0: CVE-2020-11742: xen: Bad continuation
+  handling in GNTTABOP_copy (XSA-318)
+  5e95afb8-gnttab-fix-GNTTABOP_copy-continuation-handling.patch
+
+-------------------------------------------------------------------
+Mon Apr  6 12:01:45 MDT 2020 - carn...@suse.com
+
+- bsc#1168140 - VUL-0: CVE-2020-11740, CVE-2020-11741: xen: XSA-313
+  multiple xenoprof issues
+  5e95ad61-xenoprof-clear-buffer-intended-to-be-shared-with-guests.patch
+  5e95ad8f-xenoprof-limit-consumption-of-shared-buffer-data.patch
+- bsc#1168142 - VUL-0: CVE-2020-11739: xen: XSA-314 - Missing
+  memory barriers in read-write unlock paths
+  5e95ae77-Add-missing-memory-barrier-in-the-unlock-path-of-rwlock.patch
+- bsc#1168143 - VUL-0: CVE-2020-11743: xen: XSA-316 - Bad error
+  path in GNTTABOP_map_grant
+  5e95af5e-xen-gnttab-Fix-error-path-in-map_grant_ref.patch
+- bsc#1167152 - L3: Xenstored Crashed during VM install Need Core
+  analyzed
+  5e876b0f-tools-xenstore-fix-use-after-free-in-xenstored.patch
+- bsc#1165206 - Xen 4.12 DomU hang / freeze / stall / NMI watchdog
+  bug soft lockup CPU #0 stuck under high load / upstream with
+  workaround. See also bsc#1134506
+  5e86f7b7-credit2-avoid-vCPUs-with-lower-creds-than-idle.patch
+  5e86f7fd-credit2-fix-credit-too-few-resets.patch
+- Drop for upstream solution (bsc#1165206)
+  01-xen-credit2-avoid-vcpus-to.patch
+  default-to-credit1-scheduler.patch
+- Upstream bug fixes (bsc#1027519)
+  5e4ec20e-x86-virtualise-MSR_PLATFORM_ID-properly.patch
+  5e5e7188-fix-error-path-in-cpupool_unassign_cpu_start.patch
+  5e6f53dd-AMD-IOMMU-fix-off-by-one-get_paging_mode.patch
+  5e7a371c-sched-fix-cpu-onlining-with-core-sched.patch
+  5e7c90cf-sched-fix-cpu-offlining-with-core-sched.patch
+  5e7cfb29-x86-ucode-AMD-fix-assert-in-compare_patch.patch
+  5e7cfb29-x86-ucode-fix-error-paths-in-apply_microcode.patch
+  5e7dd83b-libx86-CPUID-fix-not-just-leaf-7.patch
+  5e7dfbf6-x86-ucode-AMD-potential-buffer-overrun-equiv-tab.patch
+  5e846cce-x86-HVM-fix-AMD-ECS-handling-for-Fam10.patch
+  5e84905c-x86-ucode-AMD-fix-more-potential-buffer-overruns.patch
+
+-------------------------------------------------------------------

Old:
----
  01-xen-credit2-avoid-vcpus-to.patch
  default-to-credit1-scheduler.patch

New:
----
  5e4ec20e-x86-virtualise-MSR_PLATFORM_ID-properly.patch
  5e5e7188-fix-error-path-in-cpupool_unassign_cpu_start.patch
  5e6f53dd-AMD-IOMMU-fix-off-by-one-get_paging_mode.patch
  5e7a371c-sched-fix-cpu-onlining-with-core-sched.patch
  5e7c90cf-sched-fix-cpu-offlining-with-core-sched.patch
  5e7cfb29-x86-ucode-AMD-fix-assert-in-compare_patch.patch
  5e7cfb29-x86-ucode-fix-error-paths-in-apply_microcode.patch
  5e7dd83b-libx86-CPUID-fix-not-just-leaf-7.patch
  5e7dfbf6-x86-ucode-AMD-potential-buffer-overrun-equiv-tab.patch
  5e846cce-x86-HVM-fix-AMD-ECS-handling-for-Fam10.patch
  5e84905c-x86-ucode-AMD-fix-more-potential-buffer-overruns.patch
  5e86f7b7-credit2-avoid-vCPUs-with-lower-creds-than-idle.patch
  5e86f7fd-credit2-fix-credit-too-few-resets.patch
  5e876b0f-tools-xenstore-fix-use-after-free-in-xenstored.patch
  5e95ad61-xenoprof-clear-buffer-intended-to-be-shared-with-guests.patch
  5e95ad8f-xenoprof-limit-consumption-of-shared-buffer-data.patch
  5e95ae77-Add-missing-memory-barrier-in-the-unlock-path-of-rwlock.patch
  5e95af5e-xen-gnttab-Fix-error-path-in-map_grant_ref.patch
  5e95afb8-gnttab-fix-GNTTABOP_copy-continuation-handling.patch

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ xen.spec ++++++
--- /var/tmp/diff_new_pack.fJyATU/_old  2020-04-21 19:08:06.032059398 +0200
+++ /var/tmp/diff_new_pack.fJyATU/_new  2020-04-21 19:08:06.036059407 +0200
@@ -1,7 +1,7 @@
 #
 # spec file for package xen
 #
-# Copyright (c) 2020 SUSE LLC
+# Copyright (c) 2020 SUSE LINUX GmbH, Nuernberg, Germany.
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -127,7 +127,7 @@
 BuildRequires:  pesign-obs-integration
 %endif
 
-Version:        4.13.0_11
+Version:        4.13.0_12
 Release:        0
 Summary:        Xen Virtualization: Hypervisor (aka VMM aka Microkernel)
 License:        GPL-2.0-only
@@ -183,6 +183,25 @@
 Patch15:        5e4c00ef-VT-d-check-full-RMRR-for-E820-reserved.patch
 Patch16:        5e4d4f5b-sched-fix-get_cpu_idle_time-with-core-sched.patch
 Patch17:        
5e4e614d-x86-spec-ctrl-no-xen-also-disables-branch-hardening.patch
+Patch18:        5e4ec20e-x86-virtualise-MSR_PLATFORM_ID-properly.patch
+Patch19:        5e5e7188-fix-error-path-in-cpupool_unassign_cpu_start.patch
+Patch20:        5e6f53dd-AMD-IOMMU-fix-off-by-one-get_paging_mode.patch
+Patch21:        5e7a371c-sched-fix-cpu-onlining-with-core-sched.patch
+Patch22:        5e7c90cf-sched-fix-cpu-offlining-with-core-sched.patch
+Patch23:        5e7cfb29-x86-ucode-AMD-fix-assert-in-compare_patch.patch
+Patch24:        5e7cfb29-x86-ucode-fix-error-paths-in-apply_microcode.patch
+Patch25:        5e7dd83b-libx86-CPUID-fix-not-just-leaf-7.patch
+Patch26:        5e7dfbf6-x86-ucode-AMD-potential-buffer-overrun-equiv-tab.patch
+Patch27:        5e846cce-x86-HVM-fix-AMD-ECS-handling-for-Fam10.patch
+Patch28:        5e84905c-x86-ucode-AMD-fix-more-potential-buffer-overruns.patch
+Patch29:        5e86f7b7-credit2-avoid-vCPUs-with-lower-creds-than-idle.patch
+Patch30:        5e86f7fd-credit2-fix-credit-too-few-resets.patch
+Patch31:        5e876b0f-tools-xenstore-fix-use-after-free-in-xenstored.patch
+Patch32:        
5e95ad61-xenoprof-clear-buffer-intended-to-be-shared-with-guests.patch
+Patch33:        5e95ad8f-xenoprof-limit-consumption-of-shared-buffer-data.patch
+Patch34:        
5e95ae77-Add-missing-memory-barrier-in-the-unlock-path-of-rwlock.patch
+Patch35:        5e95af5e-xen-gnttab-Fix-error-path-in-map_grant_ref.patch
+Patch36:        5e95afb8-gnttab-fix-GNTTABOP_copy-continuation-handling.patch
 # Our platform specific patches
 Patch400:       xen-destdir.patch
 Patch401:       vif-bridge-no-iptables.patch
@@ -194,13 +213,11 @@
 Patch407:       replace-obsolete-network-configuration-commands-in-s.patch
 Patch408:       disable-building-pv-shim.patch
 Patch409:       xenstore-launch.patch
-Patch410:       default-to-credit1-scheduler.patch
 # Needs to go upstream
 Patch420:       suspend_evtchn_lock.patch
 Patch422:       stubdom-have-iovec.patch
 Patch423:       vif-route.patch
 Patch424:       gcc10-fixes.patch
-Patch425:       01-xen-credit2-avoid-vcpus-to.patch
 # Other bug fixes or features
 Patch451:       xenconsole-no-multiple-connections.patch
 Patch452:       hibernate.patch
@@ -428,6 +445,25 @@
 %patch15 -p1
 %patch16 -p1
 %patch17 -p1
+%patch18 -p1
+%patch19 -p1
+%patch20 -p1
+%patch21 -p1
+%patch22 -p1
+%patch23 -p1
+%patch24 -p1
+%patch25 -p1
+%patch26 -p1
+%patch27 -p1
+%patch28 -p1
+%patch29 -p1
+%patch30 -p1
+%patch31 -p1
+%patch32 -p1
+%patch33 -p1
+%patch34 -p1
+%patch35 -p1
+%patch36 -p1
 # Our platform specific patches
 %patch400 -p1
 %patch401 -p1
@@ -439,13 +475,11 @@
 %patch407 -p1
 %patch408 -p1
 %patch409 -p1
-%patch410 -p1
 # Needs to go upstream
 %patch420 -p1
 %patch422 -p1
 %patch423 -p1
 %patch424 -p1
-%patch425 -p1
 # Other bug fixes or features
 %patch451 -p1
 %patch452 -p1

++++++ 5e4ec20e-x86-virtualise-MSR_PLATFORM_ID-properly.patch ++++++
# Commit 691265f96097d4fe3e46ff4267451d49b30143e6
# Date 2020-02-20 17:29:50 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/msr: Virtualise MSR_PLATFORM_ID properly

This is an Intel-only, read-only MSR related to microcode loading.  Expose it
in similar circumstances as the PATCHLEVEL MSR.

This should have been alongside c/s 013896cb8b2 "x86/msr: Fix handling of
MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV"

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -139,6 +139,13 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         /* Not offered to guests. */
         goto gp_fault;
 
+    case MSR_IA32_PLATFORM_ID:
+        if ( !(cp->x86_vendor & X86_VENDOR_INTEL) ||
+             !(boot_cpu_data.x86_vendor & X86_VENDOR_INTEL) )
+            goto gp_fault;
+        rdmsrl(MSR_IA32_PLATFORM_ID, *val);
+        break;
+
     case MSR_AMD_PATCHLEVEL:
         BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
         /*
@@ -271,6 +278,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t
     {
         uint64_t rsvd;
 
+    case MSR_IA32_PLATFORM_ID:
     case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
++++++ 5e5e7188-fix-error-path-in-cpupool_unassign_cpu_start.patch ++++++
# Commit 98ed1f43cc2c89efd38deed1035dba5b1ced5d45
# Date 2020-03-03 16:02:32 +0100
# Author Juergen Gross <jgr...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
sched: fix error path in cpupool_unassign_cpu_start()

In case moving away all domains from the cpu to be removed is failing
in cpupool_unassign_cpu_start() the error path is missing to release
sched_res_rculock.

The normal exit path is releasing domlist_read_lock instead (this is
currently no problem as the reference to the specific rcu lock is not
used by rcu_read_unlock()).

While at it indent the present error label by one space.

Reported-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Signed-off-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Dario Faggioli <dfaggi...@suse.com>

--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -465,7 +465,7 @@ static int cpupool_unassign_cpu_start(st
         }
         rcu_read_unlock(&domlist_read_lock);
         if ( ret )
-            goto out;
+            goto out_rcu;
     }
     cpupool_moving_cpu = cpu;
     atomic_inc(&c->refcnt);
@@ -473,8 +473,9 @@ static int cpupool_unassign_cpu_start(st
     cpumask_andnot(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask);
 
-    rcu_read_unlock(&domlist_read_lock);
-out:
+ out_rcu:
+    rcu_read_unlock(&sched_res_rculock);
+ out:
     spin_unlock(&cpupool_lock);
 
     return ret;
++++++ 5e6f53dd-AMD-IOMMU-fix-off-by-one-get_paging_mode.patch ++++++
# Commit b75b3c62fe4afe381c6f74a07f614c0b39fe2f5d
# Date 2020-03-16 11:24:29 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
AMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode() callers

amd_iommu_get_paging_mode() expects a count, not a "maximum possible"
value. Prior to b4f042236ae0 dropping the reference, the use of our mis-
named "max_page" in amd_iommu_domain_init() may have lead to such a
misunderstanding. In an attempt to avoid such confusion in the future,
rename the function's parameter and - while at it - convert it to an
inline function.

Also replace a literal 4 by an expression tying it to a wider use
constant, just like amd_iommu_quarantine_init() does.

Fixes: ea38867831da ("x86 / iommu: set up a scratch page in the quarantine 
domain")
Fixes: b4f042236ae0 ("AMD/IOMMU: Cease using a dynamic height for the IOMMU 
pagetables")
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -455,9 +455,9 @@ int amd_iommu_reserve_domain_unity_map(s
 int __init amd_iommu_quarantine_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
-    unsigned long max_gfn =
-        PFN_DOWN((1ul << DEFAULT_DOMAIN_ADDRESS_WIDTH) - 1);
-    unsigned int level = amd_iommu_get_paging_mode(max_gfn);
+    unsigned long end_gfn =
+        1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
+    unsigned int level = amd_iommu_get_paging_mode(end_gfn);
     struct amd_iommu_pte *table;
 
     if ( hd->arch.root_table )
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -231,22 +231,6 @@ static int __must_check allocate_domain_
     return rc;
 }
 
-int amd_iommu_get_paging_mode(unsigned long entries)
-{
-    int level = 1;
-
-    BUG_ON( !entries );
-
-    while ( entries > PTE_PER_TABLE_SIZE )
-    {
-        entries = PTE_PER_TABLE_ALIGN(entries) >> PTE_PER_TABLE_SHIFT;
-        if ( ++level > 6 )
-            return -ENOMEM;
-    }
-
-    return level;
-}
-
 static int amd_iommu_domain_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
@@ -259,8 +243,10 @@ static int amd_iommu_domain_init(struct
      *   physical address space we give it, but this isn't known yet so use 4
      *   unilaterally.
      */
-    hd->arch.paging_mode = is_hvm_domain(d)
-        ? 4 : amd_iommu_get_paging_mode(get_upper_mfn_bound());
+    hd->arch.paging_mode = amd_iommu_get_paging_mode(
+        is_hvm_domain(d)
+        ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
+        : get_upper_mfn_bound() + 1);
 
     return 0;
 }
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -54,7 +54,6 @@ int amd_iommu_init_late(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
 int iov_adjust_irq_affinities(void);
 
-int amd_iommu_get_paging_mode(unsigned long entries);
 int amd_iommu_quarantine_init(struct domain *d);
 
 /* mapping functions */
@@ -177,6 +176,22 @@ static inline unsigned long region_to_pa
     return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
 }
 
+static inline int amd_iommu_get_paging_mode(unsigned long max_frames)
+{
+    int level = 1;
+
+    BUG_ON(!max_frames);
+
+    while ( max_frames > PTE_PER_TABLE_SIZE )
+    {
+        max_frames = PTE_PER_TABLE_ALIGN(max_frames) >> PTE_PER_TABLE_SHIFT;
+        if ( ++level > 6 )
+            return -ENOMEM;
+    }
+
+    return level;
+}
+
 static inline struct page_info* alloc_amd_iommu_pgtable(void)
 {
     struct page_info *pg;
++++++ 5e7a371c-sched-fix-cpu-onlining-with-core-sched.patch ++++++
# Commit 4c7d340f75abc64f131b0f9bffd6d66d72e43528
# Date 2020-03-24 17:36:44 +0100
# Author Juergen Gross <jgr...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
sched: fix onlining cpu with core scheduling active

When onlining a cpu cpupool_cpu_add() checks whether all siblings of
the new cpu are free in order to decide whether to add it to cpupool0.
In case the added cpu is not the last sibling to be onlined this test
is wrong as it only checks for all online siblings to be free. The
test should include the check for the number of siblings having
reached the scheduling granularity of cpupool0, too.

Signed-off-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Dario Faggioli <dfaggi...@suse.com>

--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -615,7 +615,8 @@ static int cpupool_cpu_add(unsigned int
     get_sched_res(cpu)->cpupool = NULL;
 
     cpus = sched_get_opt_cpumask(cpupool0->gran, cpu);
-    if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+    if ( cpumask_subset(cpus, &cpupool_free_cpus) &&
+         cpumask_weight(cpus) == cpupool_get_granularity(cpupool0) )
         ret = cpupool_assign_cpu_locked(cpupool0, cpu);
 
     rcu_read_unlock(&sched_res_rculock);
++++++ 5e7c90cf-sched-fix-cpu-offlining-with-core-sched.patch ++++++
# Commit b6f5334aeaca133ad47ade06f4d22baf5984b55d
# Date 2020-03-26 12:23:59 +0100
# Author Juergen Gross <jgr...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
sched: fix cpu offlining with core scheduling

Offlining a cpu with core scheduling active can result in a hanging
system. Reason is the scheduling resource and unit of the to be removed
cpus needs to be split in order to remove the cpu from its cpupool and
move it to the idle scheduler. In case one of the involved cpus happens
to have received a sched slave event due to a vcpu former having been
running on that cpu being woken up again, it can happen that this cpu
will enter sched_wait_rendezvous_in() while its scheduling resource is
just about to be split. It might wait for ever for the other sibling
to join, which will never happen due to the resources already being
modified.

This can easily be avoided by:
- resetting the rendezvous counters of the idle unit which is kept
- checking for a new scheduling resource in sched_wait_rendezvous_in()
  after reacquiring the scheduling lock and resetting the counters in
  that case without scheduling another vcpu
- moving schedule resource modifications (in schedule_cpu_rm()) and
  retrieving (schedule(), sched_slave() is fine already, others are not
  critical) into locked regions

Reported-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Signed-off-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Dario Faggioli <dfaggi...@suse.com>

--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2112,6 +2112,10 @@ void sched_context_switched(struct vcpu
     rcu_read_unlock(&sched_res_rculock);
 }
 
+/*
+ * Switch to a new context or keep the current one running.
+ * On x86 it won't return, so it needs to drop the still held 
sched_res_rculock.
+ */
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
                                  bool reset_idle_unit, s_time_t now)
 {
@@ -2221,6 +2225,9 @@ static struct vcpu *sched_force_context_
  * zero do_schedule() is called and the rendezvous counter for leaving
  * context_switch() is set. All other members will wait until the counter is
  * becoming zero, dropping the schedule lock in between.
+ * Either returns the new unit to run, or NULL if no context switch is
+ * required or (on Arm) has already been performed. If NULL is returned
+ * sched_res_rculock has been dropped.
  */
 static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
                                                    spinlock_t **lock, int cpu,
@@ -2228,7 +2235,8 @@ static struct sched_unit *sched_wait_ren
 {
     struct sched_unit *next;
     struct vcpu *v;
-    unsigned int gran = get_sched_res(cpu)->granularity;
+    struct sched_resource *sr = get_sched_res(cpu);
+    unsigned int gran = sr->granularity;
 
     if ( !--prev->rendezvous_in_cnt )
     {
@@ -2295,6 +2303,21 @@ static struct sched_unit *sched_wait_ren
             atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
             prev->rendezvous_in_cnt = 0;
         }
+
+        /*
+         * Check for scheduling resource switched. This happens when we are
+         * moved away from our cpupool and cpus are subject of the idle
+         * scheduler now.
+         */
+        if ( unlikely(sr != get_sched_res(cpu)) )
+        {
+            ASSERT(is_idle_unit(prev));
+            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
+            prev->rendezvous_in_cnt = 0;
+            pcpu_schedule_unlock_irq(*lock, cpu);
+            rcu_read_unlock(&sched_res_rculock);
+            return NULL;
+        }
     }
 
     return prev->next_task;
@@ -2380,11 +2403,11 @@ static void schedule(void)
 
     rcu_read_lock(&sched_res_rculock);
 
+    lock = pcpu_schedule_lock_irq(cpu);
+
     sr = get_sched_res(cpu);
     gran = sr->granularity;
 
-    lock = pcpu_schedule_lock_irq(cpu);
-
     if ( prev->rendezvous_in_cnt )
     {
         /*
@@ -2965,7 +2988,10 @@ int schedule_cpu_rm(unsigned int cpu)
         per_cpu(sched_res_idx, cpu_iter) = 0;
         if ( cpu_iter == cpu )
         {
-            idle_vcpu[cpu_iter]->sched_unit->priv = NULL;
+            unit = idle_vcpu[cpu_iter]->sched_unit;
+            unit->priv = NULL;
+            atomic_set(&unit->next_task->rendezvous_out_cnt, 0);
+            unit->rendezvous_in_cnt = 0;
         }
         else
         {
@@ -2996,6 +3022,8 @@ int schedule_cpu_rm(unsigned int cpu)
     }
     sr->scheduler = &sched_idle_ops;
     sr->sched_priv = NULL;
+    sr->granularity = 1;
+    sr->cpupool = NULL;
 
     smp_mb();
     sr->schedule_lock = &sched_free_cpu_lock;
@@ -3008,9 +3036,6 @@ int schedule_cpu_rm(unsigned int cpu)
     sched_free_udata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
-    sr->granularity = 1;
-    sr->cpupool = NULL;
-
 out:
     rcu_read_unlock(&sched_res_rculock);
     xfree(sr_new);
++++++ 5e7cfb29-x86-ucode-AMD-fix-assert-in-compare_patch.patch ++++++
# Commit 13ed5d49a4214dc3521d4af7bfcf13fbcf5bfd63
# Date 2020-03-26 18:57:45 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/ucode/amd: Fix assertion in compare_patch()

This is clearly a typo.

Fixes: 9da23943ccd "microcode: introduce a global cache of ucode patch"
Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Wei Liu <w...@xen.org>

--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -222,7 +222,7 @@ static enum microcode_match_result compa
 
     /* Both patches to compare are supposed to be applicable to local CPU. */
     ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
-    ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
+    ASSERT(microcode_fits(old->mc_amd) != MIS_UCODE);
 
     return compare_header(new_header, old_header);
 }
++++++ 5e7cfb29-x86-ucode-fix-error-paths-in-apply_microcode.patch ++++++
# Commit d2a0a96cf76603b2e2b87c3ce80c3f9d098327d4
# Date 2020-03-26 18:57:45 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/ucode: Fix error paths in apply_microcode()

In the unlikley case that patch application completes, but the resutling
revision isn't expected, sig->rev doesn't get updated to match reality.

It will get adjusted the next time collect_cpu_info() gets called, but in the
meantime Xen might operate on a stale value.  Nothing good will come of this.

Rewrite the logic to always update the stashed revision, before worrying about
whether the attempt was a success or failure.

Take the opportunity to make the printk() messages as consistent as possible.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Wei Liu <w...@xen.org>

--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -229,11 +229,11 @@ static enum microcode_match_result compa
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-    uint32_t rev;
     int hw_err;
     unsigned int cpu = smp_processor_id();
     struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
     const struct microcode_header_amd *hdr;
+    uint32_t rev, old_rev = sig->rev;
 
     if ( !patch )
         return -ENOENT;
@@ -249,6 +249,7 @@ static int apply_microcode(const struct
 
     /* get patch id after patching */
     rdmsrl(MSR_AMD_PATCHLEVEL, rev);
+    sig->rev = rev;
 
     /*
      * Some processors leave the ucode blob mapping as UC after the update.
@@ -259,15 +260,14 @@ static int apply_microcode(const struct
     /* check current patch id and patch's id for match */
     if ( hw_err || (rev != hdr->patch_id) )
     {
-        printk(KERN_ERR "microcode: CPU%d update from revision "
-               "%#x to %#x failed\n", cpu, rev, hdr->patch_id);
+        printk(XENLOG_ERR
+               "microcode: CPU%u update rev %#x to %#x failed, result %#x\n",
+               cpu, old_rev, hdr->patch_id, rev);
         return -EIO;
     }
 
-    printk(KERN_WARNING "microcode: CPU%d updated from revision %#x to %#x\n",
-           cpu, sig->rev, hdr->patch_id);
-
-    sig->rev = rev;
+    printk(XENLOG_WARNING "microcode: CPU%u updated from revision %#x to 
%#x\n",
+           cpu, old_rev, rev);
 
     return 0;
 }
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -285,10 +285,10 @@ static enum microcode_match_result compa
 static int apply_microcode(const struct microcode_patch *patch)
 {
     uint64_t msr_content;
-    unsigned int val[2];
-    unsigned int cpu_num = raw_smp_processor_id();
+    unsigned int cpu = smp_processor_id();
     struct cpu_signature *sig = &this_cpu(cpu_sig);
     const struct microcode_intel *mc_intel;
+    uint32_t rev, old_rev = sig->rev;
 
     if ( !patch )
         return -ENOENT;
@@ -309,20 +309,20 @@ static int apply_microcode(const struct
 
     /* get the current revision from MSR 0x8B */
     rdmsrl(MSR_IA32_UCODE_REV, msr_content);
-    val[1] = (uint32_t)(msr_content >> 32);
+    sig->rev = rev = msr_content >> 32;
 
-    if ( val[1] != mc_intel->hdr.rev )
+    if ( rev != mc_intel->hdr.rev )
     {
-        printk(KERN_ERR "microcode: CPU%d update from revision "
-               "%#x to %#x failed. Resulting revision is %#x.\n", cpu_num,
-               sig->rev, mc_intel->hdr.rev, val[1]);
+        printk(XENLOG_ERR
+               "microcode: CPU%u update rev %#x to %#x failed, result %#x\n",
+               cpu, old_rev, mc_intel->hdr.rev, rev);
         return -EIO;
     }
-    printk(KERN_INFO "microcode: CPU%d updated from revision "
-           "%#x to %#x, date = %04x-%02x-%02x\n",
-           cpu_num, sig->rev, val[1], mc_intel->hdr.year,
+
+    printk(XENLOG_WARNING
+           "microcode: CPU%u updated from revision %#x to %#x, date = 
%04x-%02x-%02x\n",
+           cpu, old_rev, rev, mc_intel->hdr.year,
            mc_intel->hdr.month, mc_intel->hdr.day);
-    sig->rev = val[1];
 
     return 0;
 }
++++++ 5e7dd83b-libx86-CPUID-fix-not-just-leaf-7.patch ++++++
# Commit eb0bad81fceb3e81df5f73441771b49b732edf56
# Date 2020-03-27 11:40:59 +0100
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
libx86/CPUID: fix (not just) leaf 7 processing

For one, subleaves within the respective union shouldn't live in
separate sub-structures. And then x86_cpuid_policy_fill_native() should,
as it did originally, iterate over all subleaves here as well as over
all main leaves. Switch to using a "<= MIN()"-based approach similar to
that used in x86_cpuid_copy_to_buffer(). Also follow this for the
extended main leaves then.

Fixes: 1bd2b750537b ("libx86: Fix 32bit stubdom build of 
x86_cpuid_policy_fill_native()")
Fixes: 97e4ebdcd765 ("x86/CPUID: support leaf 7 subleaf 1 / AVX512_BF16")
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>

--- a/xen/include/xen/lib/x86/cpuid.h
+++ b/xen/include/xen/lib/x86/cpuid.h
@@ -181,8 +181,7 @@ struct cpuid_policy
                 uint32_t _7d0;
                 struct { DECL_BITFIELD(7d0); };
             };
-        };
-        struct {
+
             /* Subleaf 1. */
             union {
                 uint32_t _7a1;
--- a/xen/lib/x86/cpuid.c
+++ b/xen/lib/x86/cpuid.c
@@ -71,8 +71,8 @@ void x86_cpuid_policy_fill_native(struct
     unsigned int i;
 
     cpuid_leaf(0, &p->basic.raw[0]);
-    for ( i = 1; i < min_t(unsigned int, ARRAY_SIZE(p->basic.raw),
-                           p->basic.max_leaf); ++i )
+    for ( i = 1; i <= MIN(p->basic.max_leaf,
+                          ARRAY_SIZE(p->basic.raw) - 1); ++i )
     {
         switch ( i )
         {
@@ -116,8 +116,8 @@ void x86_cpuid_policy_fill_native(struct
     {
         cpuid_count_leaf(7, 0, &p->feat.raw[0]);
 
-        for ( i = 1; i < min_t(unsigned int, ARRAY_SIZE(p->feat.raw),
-                               p->feat.max_subleaf); ++i )
+        for ( i = 1; i <= MIN(p->feat.max_subleaf,
+                              ARRAY_SIZE(p->feat.raw) - 1); ++i )
             cpuid_count_leaf(7, i, &p->feat.raw[i]);
     }
 
@@ -172,8 +172,8 @@ void x86_cpuid_policy_fill_native(struct
 
     /* Extended leaves. */
     cpuid_leaf(0x80000000, &p->extd.raw[0]);
-    for ( i = 1; i < min_t(unsigned int, ARRAY_SIZE(p->extd.raw),
-                           p->extd.max_leaf + 1 - 0x80000000); ++i )
+    for ( i = 1; i <= MIN(p->extd.max_leaf & 0xffffU,
+                          ARRAY_SIZE(p->extd.raw) - 1); ++i )
         cpuid_leaf(0x80000000 + i, &p->extd.raw[i]);
 
     x86_cpuid_policy_recalc_synth(p);
++++++ 5e7dfbf6-x86-ucode-AMD-potential-buffer-overrun-equiv-tab.patch ++++++
# Commit 1f97b6b9f1b5978659c5735954c37c130e7bb151
# Date 2020-03-27 13:13:26 +0000
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/ucode/amd: Fix potential buffer overrun with equiv table handling

find_equiv_cpu_id() loops until it finds a 0 installed_cpu entry.  Well formed
AMD microcode containers have this property.

Extend the checking in install_equiv_cpu_table() to reject tables which don't
have a sentinal at the end.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>

--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -321,6 +321,7 @@ static int install_equiv_cpu_table(
     size_t *offset)
 {
     const struct mpbhdr *mpbuf = data + *offset + 4;
+    const struct equiv_cpu_entry *eq;
 
     *offset += mpbuf->len + CONT_HDR_SIZE;     /* add header length */
 
@@ -330,7 +331,9 @@ static int install_equiv_cpu_table(
         return -EINVAL;
     }
 
-    if ( mpbuf->len == 0 )
+    if ( mpbuf->len == 0 || mpbuf->len % sizeof(*eq) ||
+         (eq = (const void *)mpbuf->data,
+          eq[(mpbuf->len / sizeof(*eq)) - 1].installed_cpu) )
     {
         printk(KERN_ERR "microcode: Wrong microcode equivalent cpu table 
length\n");
         return -EINVAL;
++++++ 5e846cce-x86-HVM-fix-AMD-ECS-handling-for-Fam10.patch ++++++
# Commit 5d515b1c296ebad6889748ea1e49e063453216a3
# Date 2020-04-01 12:28:30 +0200
# Author Jan Beulich <jbeul...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
x86/HVM: fix AMD ECS handling for Fam10

The involved comparison was, very likely inadvertently, converted from
>= to > when making changes unrelated to the actual family range.

Fixes: 9841eb71ea87 ("x86/cpuid: Drop a guests cached x86 family and model 
information")
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Paul Durrant <p...@xen.org>

--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -1283,7 +1283,7 @@ struct hvm_ioreq_server *hvm_select_iore
         if ( CF8_ADDR_HI(cf8) &&
              d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
              (x86_fam = get_cpu_family(
-                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) > 0x10 &&
+                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
              x86_fam < 0x17 )
         {
             uint64_t msr_val;
++++++ 5e84905c-x86-ucode-AMD-fix-more-potential-buffer-overruns.patch ++++++
# Commit 718d1432000079ea7120f6cb770372afe707ce27
# Date 2020-04-01 14:00:12 +0100
# Author Andrew Cooper <andrew.coop...@citrix.com>
# Committer Andrew Cooper <andrew.coop...@citrix.com>
x86/ucode/amd: Fix more potential buffer overruns with microcode parsing

cpu_request_microcode() doesn't know the buffer is at least 4 bytes long
before inspecting UCODE_MAGIC.

install_equiv_cpu_table() doesn't know the boundary of the buffer it is
interpreting as an equivalency table.  This case was clearly observed at one
point in the past, given the subsequent overrun detection, but without
comprehending that the damage was already done.

Make the logic consistent with container_fast_forward() and pass size_left in
to install_equiv_cpu_table().

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>

--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -318,11 +318,20 @@ static int get_ucode_from_buffer_amd(
 static int install_equiv_cpu_table(
     struct microcode_amd *mc_amd,
     const void *data,
+    size_t size_left,
     size_t *offset)
 {
-    const struct mpbhdr *mpbuf = data + *offset + 4;
+    const struct mpbhdr *mpbuf;
     const struct equiv_cpu_entry *eq;
 
+    if ( size_left < (sizeof(*mpbuf) + 4) ||
+         (mpbuf = data + *offset + 4,
+          size_left - sizeof(*mpbuf) - 4 < mpbuf->len) )
+    {
+        printk(XENLOG_WARNING "microcode: No space for equivalent cpu 
table\n");
+        return -EINVAL;
+    }
+
     *offset += mpbuf->len + CONT_HDR_SIZE;     /* add header length */
 
     if ( mpbuf->type != UCODE_EQUIV_CPU_TABLE_TYPE )
@@ -436,7 +445,7 @@ static struct microcode_patch *cpu_reque
 
     current_cpu_id = cpuid_eax(0x00000001);
 
-    if ( *(const uint32_t *)buf != UCODE_MAGIC )
+    if ( bufsize < 4 || *(const uint32_t *)buf != UCODE_MAGIC )
     {
         printk(KERN_ERR "microcode: Wrong microcode patch file magic\n");
         error = -EINVAL;
@@ -466,24 +475,13 @@ static struct microcode_patch *cpu_reque
      */
     while ( offset < bufsize )
     {
-        error = install_equiv_cpu_table(mc_amd, buf, &offset);
+        error = install_equiv_cpu_table(mc_amd, buf, bufsize - offset, 
&offset);
         if ( error )
         {
             printk(KERN_ERR "microcode: installing equivalent cpu table 
failed\n");
             break;
         }
 
-        /*
-         * Could happen as we advance 'offset' early
-         * in install_equiv_cpu_table
-         */
-        if ( offset > bufsize )
-        {
-            printk(KERN_ERR "microcode: Microcode buffer overrun\n");
-            error = -EINVAL;
-            break;
-        }
-
         if ( find_equiv_cpu_id(mc_amd->equiv_cpu_table, current_cpu_id,
                                &equiv_cpu_id) )
             break;
++++++ 5e86f7b7-credit2-avoid-vCPUs-with-lower-creds-than-idle.patch ++++++
# Commit 36f3662f27dec32d76c0edb4c6b62b9628d6869d
# Date 2020-04-03 10:45:43 +0200
# Author Dario Faggioli <dfaggi...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
credit2: avoid vCPUs to ever reach lower credits than idle

There have been report of stalls of guest vCPUs, when Credit2 was used.
It seemed like these vCPUs were not getting scheduled for very long
time, even under light load conditions (e.g., during dom0 boot).

Investigations led to the discovery that --although rarely-- it can
happen that a vCPU manages to run for very long timeslices. In Credit2,
this means that, when runtime accounting happens, the vCPU will lose a
large quantity of credits. This in turn may lead to the vCPU having less
credits than the idle vCPUs (-2^30). At this point, the scheduler will
pick the idle vCPU, instead of the ready to run vCPU, for a few
"epochs", which often times is enough for the guest kernel to think the
vCPU is not responding and crashing.

An example of this situation is shown here. In fact, we can see d0v1
sitting in the runqueue while all the CPUs are idle, as it has
-1254238270 credits, which is smaller than -2^30 = −1073741824:

    (XEN) Runqueue 0:
    (XEN)   ncpus              = 28
    (XEN)   cpus               = 0-27
    (XEN)   max_weight         = 256
    (XEN)   pick_bias          = 22
    (XEN)   instload           = 1
    (XEN)   aveload            = 293391 (~111%)
    (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
    (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
    (XEN)   fully idle cores: 
00,00000000,00000000,00000000,00000000,00000000,0fffffff
    [...]
    (XEN) Runqueue 0:
    (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
    [...]
    (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
    (XEN) RUNQ:
    (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 
(~100%)

We certainly don't want, under any circumstance, this to happen.
Let's, therefore, define a minimum amount of credits a vCPU can have.
During accounting, we make sure that, for however long the vCPU has
run, it will never get to have less than such minimum amount of
credits. Then, we set the credits of the idle vCPU to an even
smaller value.

NOTE: investigations have been done about _how_ it is possible for a
vCPU to execute for so much time that its credits becomes so low. While
still not completely clear, there are evidence that:
- it only happens very rarely,
- it appears to be both machine and workload specific,
- it does not look to be a Credit2 (e.g., as it happens when
  running with Credit1 as well) issue, or a scheduler issue.

This patch makes Credit2 more robust to events like this, whatever
the cause is, and should hence be backported (as far as possible).

Reported-by: Glen <glenbar...@gmail.com>
Reported-by: Tomas Mozes <hydrapo...@gmail.com>
Signed-off-by: Dario Faggioli <dfaggi...@suse.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>

--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -55,7 +55,7 @@
 0x00022204  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_add
 0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ 
dom:vcpu = 0x%(1)08x, credit = %(2)d, score = %(3)d ]
 0x00022206  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle         [ cpu = 
%(1)d ]
-0x00022207  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_reset   [ 
dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d, mult = %(4)d ]
+0x00022207  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_reset   [ 
dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d ]
 0x00022208  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:sched_tasklet
 0x00022209  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:update_load
 0x0002220a  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_assign    [ 
dom:vcpu = 0x%(1)08x, rq_id = %(2)d ]
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7718,13 +7718,12 @@ void sched_process(struct pcpu_info *p)
                 struct {
                     unsigned int vcpuid:16, domid:16;
                     int credit_start, credit_end;
-                    unsigned int multiplier;
                 } *r = (typeof(r))ri->d;
 
                 printf(" %s csched2:reset_credits d%uv%u, "
-                       "credit_start = %d, credit_end = %d, mult = %u\n",
+                       "credit_start = %d, credit_end = %d\n",
                        ri->dump_header, r->domid, r->vcpuid,
-                       r->credit_start, r->credit_end, r->multiplier);
+                       r->credit_start, r->credit_end);
             }
             break;
         case TRC_SCHED_CLASS_EVT(CSCHED2, 8):  /* SCHED_TASKLET    */
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -228,11 +228,21 @@
  */
 #define CSCHED2_CREDIT_INIT          MILLISECS(10)
 /*
+ * Minimum amount of credits VMs can have. Ideally, no VM would get
+ * close to this (unless a vCPU manages to execute for really long
+ * time uninterrupted). In case it happens, it makes no sense to
+ * track even deeper undershoots.
+ *
+ * NOTE: If making this smaller than -CSCHED2_CREDIT_INIT, adjust
+ * reset_credit() accordingly.
+ */
+#define CSCHED2_CREDIT_MIN           (-CSCHED2_CREDIT_INIT)
+/*
  * Amount of credit the idle units have. It never changes, as idle
  * units does not consume credits, and it must be lower than whatever
  * amount of credit 'regular' unit would end up with.
  */
-#define CSCHED2_IDLE_CREDIT          (-(1U<<30))
+#define CSCHED2_IDLE_CREDIT          (CSCHED2_CREDIT_MIN-1)
 /*
  * Carryover: How much "extra" credit may be carried over after
  * a reset.
@@ -776,10 +786,15 @@ static int get_fallback_cpu(struct csche
 static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
                           struct csched2_unit *svc)
 {
-    uint64_t val = time * rqd->max_weight + svc->residual;
+    int64_t val = time * rqd->max_weight + svc->residual;
 
     svc->residual = do_div(val, svc->weight);
-    svc->credit -= val;
+    /* Getting to lower credit than CSCHED2_CREDIT_MIN makes no sense. */
+    val = svc->credit - val;
+    if ( unlikely(val < CSCHED2_CREDIT_MIN) )
+        svc->credit = CSCHED2_CREDIT_MIN;
+    else
+        svc->credit = val;
 }
 
 static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct 
csched2_unit *svc)
@@ -1620,28 +1635,25 @@ static void reset_credit(const struct sc
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct list_head *iter;
-    int m;
+    int reset = CSCHED2_CREDIT_INIT;
 
     /*
      * Under normal circumstances, snext->credit should never be less
      * than -CSCHED2_MIN_TIMER.  However, under some circumstances, an
      * unit with low credits may be allowed to run long enough that
-     * its credits are actually less than -CSCHED2_CREDIT_INIT.
+     * its credits are actually much lower than that.
      * (Instances have been observed, for example, where an unit with
      * 200us of credit was allowed to run for 11ms, giving it -10.8ms
      * of credit.  Thus it was still negative even after the reset.)
      *
      * If this is the case for snext, we simply want to keep moving
-     * everyone up until it is in the black again.  This fair because
-     * none of the other units want to run at the moment.
-     *
-     * Rather than looping, however, we just calculate a multiplier,
-     * avoiding an integer division and multiplication in the common
-     * case.
+     * everyone up until it is in the black again. This means that,
+     * since CSCHED2_CREDIT_MIN is -CSCHED2_CREDIT_INIT, we need to
+     * actually add 2*CSCHED2_CREDIT_INIT.
      */
-    m = 1;
-    if ( snext->credit < -CSCHED2_CREDIT_INIT )
-        m += (-snext->credit) / CSCHED2_CREDIT_INIT;
+    ASSERT(snext->credit >= CSCHED2_CREDIT_MIN);
+    if ( unlikely(snext->credit == CSCHED2_CREDIT_MIN) )
+        reset += CSCHED2_CREDIT_INIT;
 
     list_for_each( iter, &rqd->svc )
     {
@@ -1672,15 +1684,7 @@ static void reset_credit(const struct sc
         }
 
         start_credit = svc->credit;
-
-        /*
-         * Add INIT * m, avoiding integer multiplication in the common case.
-         */
-        if ( likely(m==1) )
-            svc->credit += CSCHED2_CREDIT_INIT;
-        else
-            svc->credit += m * CSCHED2_CREDIT_INIT;
-
+        svc->credit += reset;
         /* "Clip" credits to max carryover */
         if ( svc->credit > CSCHED2_CREDIT_INIT + CSCHED2_CARRYOVER_MAX )
             svc->credit = CSCHED2_CREDIT_INIT + CSCHED2_CARRYOVER_MAX;
@@ -1692,19 +1696,18 @@ static void reset_credit(const struct sc
             struct {
                 unsigned unit:16, dom:16;
                 int credit_start, credit_end;
-                unsigned multiplier;
             } d;
             d.dom = svc->unit->domain->domain_id;
             d.unit = svc->unit->unit_id;
             d.credit_start = start_credit;
             d.credit_end = svc->credit;
-            d.multiplier = m;
             __trace_var(TRC_CSCHED2_CREDIT_RESET, 1,
                         sizeof(d),
                         (unsigned char *)&d);
         }
     }
 
+    ASSERT(snext->credit > 0);
     SCHED_STAT_CRANK(credit_reset);
 
     /* No need to resort runqueue, as everyone's order should be the same. */
++++++ 5e86f7fd-credit2-fix-credit-too-few-resets.patch ++++++
# Commit dae7b62e976b28af9c8efa150618c25501bf1650
# Date 2020-04-03 10:46:53 +0200
# Author Dario Faggioli <dfaggi...@suse.com>
# Committer Jan Beulich <jbeul...@suse.com>
credit2: fix credit reset happening too few times

There is a bug in commit 5e4b4199667b9 ("xen: credit2: only reset
credit on reset condition"). In fact, the aim of that commit was to
make sure that we do not perform too many credit reset operations
(which are not super cheap, and in an hot-path). But the check used
to determine whether a reset is necessary was the wrong one.

In fact, knowing just that some vCPUs have been skipped, while
traversing the runqueue (in runq_candidate()), is not enough. We
need to check explicitly whether the first vCPU in the runqueue
has a negative amount of credit.

Since a trace record is changed, this patch updates xentrace format file
and xenalyze as well

This should be backported.

Signed-off-by: Dario Faggioli <dfaggi...@suse.com>
Acked-by: George Dunlap <george.dun...@citrix.com>

--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -67,7 +67,7 @@
 0x00022210  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_check     [ 
lrq_id[16]:orq_id[16] = 0x%(1)08x, delta = %(2)d ]
 0x00022211  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_balance   [ 
l_bavgload = 0x%(2)08x%(1)08x, o_bavgload = 0x%(4)08x%(3)08x, 
lrq_id[16]:orq_id[16] = 0x%(5)08x ]
 0x00022212  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:pick_cpu       [ 
b_avgload = 0x%(2)08x%(1)08x, dom:vcpu = 0x%(3)08x, rq_id[16]:new_cpu[16] = 
%(4)d ]
-0x00022213  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_candidate [ 
dom:vcpu = 0x%(1)08x, credit = %(4)d, skipped_vcpus = %(3)d, tickled_cpu = 
%(2)d ]
+0x00022213  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_candidate [ 
dom:vcpu = 0x%(1)08x, credit = %(3)d, tickled_cpu = %(2)d ]
 0x00022214  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:schedule       [ 
rq:cpu = 0x%(1)08x, tasklet[8]:idle[8]:smt_idle[8]:tickled[8] = %(2)08x ]
 0x00022215  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:ratelimit      [ 
dom:vcpu = 0x%(1)08x, runtime = %(2)d ]
 0x00022216  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_cand_chk  [ 
dom:vcpu = 0x%(1)08x ]
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7855,14 +7855,12 @@ void sched_process(struct pcpu_info *p)
             if (opt.dump_all) {
                 struct {
                     unsigned vcpuid:16, domid:16;
-                    unsigned tickled_cpu, skipped;
+                    unsigned tickled_cpu;
                     int credit;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched2:runq_candidate d%uv%u, credit = %d, "
-                       "%u vcpus skipped, ",
-                       ri->dump_header, r->domid, r->vcpuid,
-                       r->credit, r->skipped);
+                printf(" %s csched2:runq_candidate d%uv%u, credit = %d, ",
+                       ri->dump_header, r->domid, r->vcpuid, r->credit);
                 if (r->tickled_cpu == (unsigned)-1)
                     printf("no cpu was tickled\n");
                 else
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3267,8 +3267,7 @@ csched2_runtime(const struct scheduler *
 static struct csched2_unit *
 runq_candidate(struct csched2_runqueue_data *rqd,
                struct csched2_unit *scurr,
-               int cpu, s_time_t now,
-               unsigned int *skipped)
+               int cpu, s_time_t now)
 {
     struct list_head *iter, *temp;
     struct sched_resource *sr = get_sched_res(cpu);
@@ -3276,8 +3275,6 @@ runq_candidate(struct csched2_runqueue_d
     struct csched2_private *prv = csched2_priv(sr->scheduler);
     bool yield = false, soft_aff_preempt = false;
 
-    *skipped = 0;
-
     if ( unlikely(is_idle_unit(scurr->unit)) )
     {
         snext = scurr;
@@ -3371,12 +3368,9 @@ runq_candidate(struct csched2_runqueue_d
                         (unsigned char *)&d);
         }
 
-        /* Only consider units that are allowed to run on this processor. */
+        /* Only consider vcpus that are allowed to run on this processor. */
         if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
-        {
-            (*skipped)++;
             continue;
-        }
 
         /*
          * If an unit is meant to be picked up by another processor, and such
@@ -3385,7 +3379,6 @@ runq_candidate(struct csched2_runqueue_d
         if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
              cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
         {
-            (*skipped)++;
             SCHED_STAT_CRANK(deferred_to_tickled_cpu);
             continue;
         }
@@ -3397,7 +3390,6 @@ runq_candidate(struct csched2_runqueue_d
         if ( sched_unit_master(svc->unit) != cpu
              && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
-            (*skipped)++;
             SCHED_STAT_CRANK(migrate_resisted);
             continue;
         }
@@ -3421,14 +3413,13 @@ runq_candidate(struct csched2_runqueue_d
     {
         struct {
             unsigned unit:16, dom:16;
-            unsigned tickled_cpu, skipped;
+            unsigned tickled_cpu;
             int credit;
         } d;
         d.dom = snext->unit->domain->domain_id;
         d.unit = snext->unit->unit_id;
         d.credit = snext->credit;
         d.tickled_cpu = snext->tickled_cpu;
-        d.skipped = *skipped;
         __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
                     sizeof(d),
                     (unsigned char *)&d);
@@ -3460,7 +3451,6 @@ static void csched2_schedule(
     struct csched2_runqueue_data *rqd;
     struct csched2_unit * const scurr = csched2_unit(currunit);
     struct csched2_unit *snext = NULL;
-    unsigned int skipped_units = 0;
     bool tickled;
     bool migrated = false;
 
@@ -3538,7 +3528,7 @@ static void csched2_schedule(
         snext = csched2_unit(sched_idle_unit(sched_cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, sched_cpu, now, &skipped_units);
+        snext = runq_candidate(rqd, scurr, sched_cpu, now);
 
     /* If switching from a non-idle runnable unit, put it
      * back on the runqueue. */
@@ -3550,6 +3540,8 @@ static void csched2_schedule(
     /* Accounting for non-idle tasks */
     if ( !is_idle_unit(snext->unit) )
     {
+        int top_credit;
+
         /* If switching, remove this from the runqueue and mark it scheduled */
         if ( snext != scurr )
         {
@@ -3577,11 +3569,15 @@ static void csched2_schedule(
          *  2) no other unit with higher credits wants to run.
          *
          * Here, where we want to check for reset, we need to make sure the
-         * proper unit is being used. In fact, runqueue_candidate() may have
-         * not returned the first unit in the runqueue, for various reasons
+         * proper unit is being used. In fact, runq_candidate() may have not
+         * returned the first unit in the runqueue, for various reasons
          * (e.g., affinity). Only trigger a reset when it does.
          */
-        if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
+        if ( list_empty(&rqd->runq) )
+            top_credit = snext->credit;
+        else
+            top_credit = max(snext->credit, runq_elem(rqd->runq.next)->credit);
+        if ( top_credit <= CSCHED2_CREDIT_RESET )
         {
             reset_credit(ops, sched_cpu, now, snext);
             balance_load(ops, sched_cpu, now);
++++++ 5e876b0f-tools-xenstore-fix-use-after-free-in-xenstored.patch ++++++
Subject: tools/xenstore: fix a use after free problem in xenstored
From: Juergen Gross jgr...@suse.com Fri Apr 3 13:03:40 2020 +0100
Date: Fri Apr 3 17:57:51 2020 +0100:
Git: bb2a34fd740e9a26be9e2244f1a5b4cef439e5a8

Commit 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object
twice") introduced a potential use after free problem in
domain_cleanup(): after calling talloc_unlink() for domain->conn
domain->conn is set to NULL. The problem is that domain is registered
as talloc child of domain->conn, so it might be freed by the
talloc_unlink() call.

With Xenstore being single threaded there are normally no concurrent
memory allocations running and freeing a virtual memory area normally
doesn't result in that area no longer being accessible. A problem
could occur only in case either a signal received results in some
memory allocation done in the signal handler (SIGHUP is a primary
candidate leading to reopening the log file), or in case the talloc
framework would do some internal memory allocation during freeing of
the memory (which would lead to clobbering of the freed domain
structure).

Fixes: 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object twice")
Signed-off-by: Juergen Gross <jgr...@suse.com>
Reviewed-by: Julien Grall <jgr...@amazon.com>

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index baddaba5df..5858185211 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -214,6 +214,7 @@ static void domain_cleanup(void)
 {
        xc_dominfo_t dominfo;
        struct domain *domain;
+       struct connection *conn;
        int notify = 0;
 
  again:
@@ -230,8 +231,10 @@ static void domain_cleanup(void)
                                continue;
                }
                if (domain->conn) {
-                       talloc_unlink(talloc_autofree_context(), domain->conn);
+                       /* domain is a talloc child of domain->conn. */
+                       conn = domain->conn;
                        domain->conn = NULL;
+                       talloc_unlink(talloc_autofree_context(), conn);
                        notify = 0; /* destroy_domain() fires the watch */
                        goto again;
                }
++++++ 5e95ad61-xenoprof-clear-buffer-intended-to-be-shared-with-guests.patch 
++++++
Subject: xenoprof: clear buffer intended to be shared with guests
From: Jan Beulich jbeul...@suse.com Tue Apr 14 14:32:33 2020 +0200
Date: Tue Apr 14 14:32:33 2020 +0200:
Git: 0763a7ebfcdad66cf9e5475a1301eefb29bae9ed

alloc_xenheap_pages() making use of MEMF_no_scrub is fine for Xen
internally used allocations, but buffers allocated to be shared with
(unpriviliged) guests need to be zapped of their prior content.

This is part of XSA-313.

Reported-by: Ilja Van Sprundel <ivansprun...@ioactive.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Wei Liu <w...@xen.org>

diff --git a/xen/common/xenoprof.c b/xen/common/xenoprof.c
index 4f3e799ebb..4d909fd5d6 100644
--- a/xen/common/xenoprof.c
+++ b/xen/common/xenoprof.c
@@ -253,6 +253,9 @@ static int alloc_xenoprof_struct(
         return -ENOMEM;
     }
 
+    for ( i = 0; i < npages; ++i )
+        clear_page(d->xenoprof->rawbuf + i * PAGE_SIZE);
+
     d->xenoprof->npages = npages;
     d->xenoprof->nbuf = nvcpu;
     d->xenoprof->bufsize = bufsize;
++++++ 5e95ad8f-xenoprof-limit-consumption-of-shared-buffer-data.patch ++++++
Subject: xenoprof: limit consumption of shared buffer data
From: Jan Beulich jbeul...@suse.com Tue Apr 14 14:33:19 2020 +0200
Date: Tue Apr 14 14:33:19 2020 +0200:
Git: 50ef9a3cb26e2f9383f6fdfbed361d8f174bae9f

Since a shared buffer can be written to by the guest, we may only read
the head and tail pointers from there (all other fields should only ever
be written to). Furthermore, for any particular operation the two values
must be read exactly once, with both checks and consumption happening
with the thus read values. (The backtrace related xenoprof_buf_space()
use in xenoprof_log_event() is an exception: The values used there get
re-checked by every subsequent xenoprof_add_sample().)

Since that code needed touching, also fix the double increment of the
lost samples count in case the backtrace related xenoprof_add_sample()
invocation in xenoprof_log_event() fails.

Where code is being touched anyway, add const as appropriate, but take
the opportunity to entirely drop the now unused domain parameter of
xenoprof_buf_space().

This is part of XSA-313.

Reported-by: Ilja Van Sprundel <ivansprun...@ioactive.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>
Reviewed-by: Wei Liu <w...@xen.org>

diff --git a/xen/common/xenoprof.c b/xen/common/xenoprof.c
index 4d909fd5d6..b04726cb49 100644
--- a/xen/common/xenoprof.c
+++ b/xen/common/xenoprof.c
@@ -479,25 +479,22 @@ static int add_passive_list(XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 
 /* Get space in the buffer */
-static int xenoprof_buf_space(struct domain *d, xenoprof_buf_t * buf, int size)
+static int xenoprof_buf_space(int head, int tail, int size)
 {
-    int head, tail;
-
-    head = xenoprof_buf(d, buf, event_head);
-    tail = xenoprof_buf(d, buf, event_tail);
-
     return ((tail > head) ? 0 : size) + tail - head - 1;
 }
 
 /* Check for space and add a sample. Return 1 if successful, 0 otherwise. */
-static int xenoprof_add_sample(struct domain *d, xenoprof_buf_t *buf,
+static int xenoprof_add_sample(const struct domain *d,
+                               const struct xenoprof_vcpu *v,
                                uint64_t eip, int mode, int event)
 {
+    xenoprof_buf_t *buf = v->buffer;
     int head, tail, size;
 
     head = xenoprof_buf(d, buf, event_head);
     tail = xenoprof_buf(d, buf, event_tail);
-    size = xenoprof_buf(d, buf, event_size);
+    size = v->event_size;
     
     /* make sure indexes in shared buffer are sane */
     if ( (head < 0) || (head >= size) || (tail < 0) || (tail >= size) )
@@ -506,7 +503,7 @@ static int xenoprof_add_sample(struct domain *d, 
xenoprof_buf_t *buf,
         return 0;
     }
 
-    if ( xenoprof_buf_space(d, buf, size) > 0 )
+    if ( xenoprof_buf_space(head, tail, size) > 0 )
     {
         xenoprof_buf(d, buf, event_log[head].eip) = eip;
         xenoprof_buf(d, buf, event_log[head].mode) = mode;
@@ -530,7 +527,6 @@ static int xenoprof_add_sample(struct domain *d, 
xenoprof_buf_t *buf,
 int xenoprof_add_trace(struct vcpu *vcpu, uint64_t pc, int mode)
 {
     struct domain *d = vcpu->domain;
-    xenoprof_buf_t *buf = d->xenoprof->vcpu[vcpu->vcpu_id].buffer;
 
     /* Do not accidentally write an escape code due to a broken frame. */
     if ( pc == XENOPROF_ESCAPE_CODE )
@@ -539,7 +535,8 @@ int xenoprof_add_trace(struct vcpu *vcpu, uint64_t pc, int 
mode)
         return 0;
     }
 
-    return xenoprof_add_sample(d, buf, pc, mode, 0);
+    return xenoprof_add_sample(d, &d->xenoprof->vcpu[vcpu->vcpu_id],
+                               pc, mode, 0);
 }
 
 void xenoprof_log_event(struct vcpu *vcpu, const struct cpu_user_regs *regs,
@@ -570,17 +567,22 @@ void xenoprof_log_event(struct vcpu *vcpu, const struct 
cpu_user_regs *regs,
     /* Provide backtrace if requested. */
     if ( backtrace_depth > 0 )
     {
-        if ( (xenoprof_buf_space(d, buf, v->event_size) < 2) ||
-             !xenoprof_add_sample(d, buf, XENOPROF_ESCAPE_CODE, mode, 
-                                  XENOPROF_TRACE_BEGIN) )
+        if ( xenoprof_buf_space(xenoprof_buf(d, buf, event_head),
+                                xenoprof_buf(d, buf, event_tail),
+                                v->event_size) < 2 )
         {
             xenoprof_buf(d, buf, lost_samples)++;
             lost_samples++;
             return;
         }
+
+        /* xenoprof_add_sample() will increment lost_samples on failure */
+        if ( !xenoprof_add_sample(d, v, XENOPROF_ESCAPE_CODE, mode,
+                                  XENOPROF_TRACE_BEGIN) )
+            return;
     }
 
-    if ( xenoprof_add_sample(d, buf, pc, mode, event) )
+    if ( xenoprof_add_sample(d, v, pc, mode, event) )
     {
         if ( is_active(vcpu->domain) )
             active_samples++;
diff --git a/xen/include/xen/xenoprof.h b/xen/include/xen/xenoprof.h
index 8f045abce6..f1f9446bd5 100644
--- a/xen/include/xen/xenoprof.h
+++ b/xen/include/xen/xenoprof.h
@@ -61,12 +61,12 @@ struct xenoprof {
 
 #ifndef CONFIG_COMPAT
 #define XENOPROF_COMPAT(x) 0
-#define xenoprof_buf(d, b, field) ((b)->field)
+#define xenoprof_buf(d, b, field) ACCESS_ONCE((b)->field)
 #else
 #define XENOPROF_COMPAT(x) ((x)->is_compat)
-#define xenoprof_buf(d, b, field) (*(!(d)->xenoprof->is_compat ? \
-                                       &(b)->native.field : \
-                                       &(b)->compat.field))
+#define xenoprof_buf(d, b, field) ACCESS_ONCE(*(!(d)->xenoprof->is_compat \
+                                                ? &(b)->native.field \
+                                                : &(b)->compat.field))
 #endif
 
 struct domain;
++++++ 5e95ae77-Add-missing-memory-barrier-in-the-unlock-path-of-rwlock.patch 
++++++
Subject: xen/rwlock: Add missing memory barrier in the unlock path of rwlock
From: Julien Grall jgr...@amazon.com Thu Feb 20 20:54:40 2020 +0000
Date: Tue Apr 14 14:37:11 2020 +0200:
Git: 6890a04072e664c25447a297fe663b45ecfd6398

The rwlock unlock paths are using atomic_sub() to release the lock.
However the implementation of atomic_sub() rightfully doesn't contain a
memory barrier. On Arm, this means a processor is allowed to re-order
the memory access with the preceeding access.

In other words, the unlock may be seen by another processor before all
the memory accesses within the "critical" section.

The rwlock paths already contains barrier indirectly, but they are not
very useful without the counterpart in the unlock paths.

The memory barriers are not necessary on x86 because loads/stores are
not re-ordered with lock instructions.

So add arch_lock_release_barrier() in the unlock paths that will only
add memory barrier on Arm.

Take the opportunity to document each lock paths explaining why a
barrier is not necessary.

This is XSA-314.

Signed-off-by: Julien Grall <jgr...@amazon.com>
Acked-by: Jan Beulich <jbeul...@suse.com>

--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -48,6 +48,10 @@ static inline int _read_trylock(rwlock_t
     if ( likely(!(cnts & _QW_WMASK)) )
     {
         cnts = (u32)atomic_add_return(_QR_BIAS, &lock->cnts);
+        /*
+         * atomic_add_return() is a full barrier so no need for an
+         * arch_lock_acquire_barrier().
+         */
         if ( likely(!(cnts & _QW_WMASK)) )
             return 1;
         atomic_sub(_QR_BIAS, &lock->cnts);
@@ -64,11 +68,19 @@ static inline void _read_lock(rwlock_t *
     u32 cnts;
 
     cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
+    /*
+     * atomic_add_return() is a full barrier so no need for an
+     * arch_lock_acquire_barrier().
+     */
     if ( likely(!(cnts & _QW_WMASK)) )
         return;
 
     /* The slowpath will decrement the reader count, if necessary. */
     queue_read_lock_slowpath(lock);
+    /*
+     * queue_read_lock_slowpath() is using spinlock and therefore is a
+     * full barrier. So no need for an arch_lock_acquire_barrier().
+     */
 }
 
 static inline void _read_lock_irq(rwlock_t *lock)
@@ -92,6 +104,7 @@ static inline unsigned long _read_lock_i
  */
 static inline void _read_unlock(rwlock_t *lock)
 {
+    arch_lock_release_barrier();
     /*
      * Atomically decrement the reader count
      */
@@ -121,11 +134,20 @@ static inline int _rw_is_locked(rwlock_t
  */
 static inline void _write_lock(rwlock_t *lock)
 {
-    /* Optimize for the unfair lock case where the fair flag is 0. */
+    /*
+     * Optimize for the unfair lock case where the fair flag is 0.
+     *
+     * atomic_cmpxchg() is a full barrier so no need for an
+     * arch_lock_acquire_barrier().
+     */
     if ( atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0 )
         return;
 
     queue_write_lock_slowpath(lock);
+    /*
+     * queue_write_lock_slowpath() is using spinlock and therefore is a
+     * full barrier. So no need for an arch_lock_acquire_barrier().
+     */
 }
 
 static inline void _write_lock_irq(rwlock_t *lock)
@@ -157,11 +179,16 @@ static inline int _write_trylock(rwlock_
     if ( unlikely(cnts) )
         return 0;
 
+    /*
+     * atomic_cmpxchg() is a full barrier so no need for an
+     * arch_lock_acquire_barrier().
+     */
     return likely(atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0);
 }
 
 static inline void _write_unlock(rwlock_t *lock)
 {
+    arch_lock_release_barrier();
     /*
      * If the writer field is atomic, it can be cleared directly.
      * Otherwise, an atomic subtraction will be used to clear it.
++++++ 5e95af5e-xen-gnttab-Fix-error-path-in-map_grant_ref.patch ++++++
Subject: xen/gnttab: Fix error path in map_grant_ref()
From: Ross Lagerwall ross.lagerw...@citrix.com Tue Apr 14 14:41:02 2020 +0200
Date: Tue Apr 14 14:41:02 2020 +0200:
Git: da0c66c8f48042a0186799014af69db0303b1da5

Part of XSA-295 (c/s 863e74eb2cffb) inadvertently re-positioned the brackets,
changing the logic.  If the _set_status() call fails, the grant_map hypercall
would fail with a status of 1 (rc != GNTST_okay) instead of the expected
negative GNTST_* error.

This error path can be taken due to bad guest state, and causes net/blk-back
in Linux to crash.

This is XSA-316.

Signed-off-by: Ross Lagerwall <ross.lagerw...@citrix.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Julien Grall <jgr...@amazon.com>

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 9fd6e60416..4b5344dc21 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1032,7 +1032,7 @@ map_grant_ref(
     {
         if ( (rc = _set_status(shah, status, rd, rgt->gt_version, act,
                                op->flags & GNTMAP_readonly, 1,
-                               ld->domain_id) != GNTST_okay) )
+                               ld->domain_id)) != GNTST_okay )
             goto act_release_out;
 
         if ( !act->pin )
++++++ 5e95afb8-gnttab-fix-GNTTABOP_copy-continuation-handling.patch ++++++
Subject: gnttab: fix GNTTABOP_copy continuation handling
From: Jan Beulich jbeul...@suse.com Tue Apr 14 14:42:32 2020 +0200
Date: Tue Apr 14 14:42:32 2020 +0200:
Git: d6f22d5d9e8d6848ec229083ac9fb044f0adea93

The XSA-226 fix was flawed - the backwards transformation on rc was done
too early, causing a continuation to not get invoked when the need for
preemption was determined at the very first iteration of the request.
This in particular means that all of the status fields of the individual
operations would be left untouched, i.e. set to whatever the caller may
or may not have initialized them to.

This is part of XSA-318.

Reported-by: Pawel Wieczorkiewicz <wipa...@amazon.de>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Juergen Gross <jgr...@suse.com>
Tested-by: Pawel Wieczorkiewicz <wipa...@amazon.de>

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 4b5344dc21..96080b3dec 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -3583,8 +3583,7 @@ do_grant_table_op(
         rc = gnttab_copy(copy, count);
         if ( rc > 0 )
         {
-            rc = count - rc;
-            guest_handle_add_offset(copy, rc);
+            guest_handle_add_offset(copy, count - rc);
             uop = guest_handle_cast(copy, void);
         }
         break;
@@ -3651,6 +3650,9 @@ do_grant_table_op(
   out:
     if ( rc > 0 || opaque_out != 0 )
     {
+        /* Adjust rc, see gnttab_copy() for why this is needed. */
+        if ( cmd == GNTTABOP_copy )
+            rc = count - rc;
         ASSERT(rc < count);
         ASSERT((opaque_out & GNTTABOP_CMD_MASK) == 0);
         rc = hypercall_create_continuation(__HYPERVISOR_grant_table_op, "ihi",

commit xen for openSUSE:Leap:15.2

Reply via email to