from:"Chao Gao"

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-21 Thread Chao Gao

On Wed, Apr 21, 2021 at 11:23:13AM +0200, Jan Beulich wrote:
>On 20.04.2021 18:17, Roger Pau Monné wrote:
>> On Tue, Apr 20, 2021 at 05:38:51PM +0200, Jan Beulich wrote:
>>> On 20.04.2021 17:08, Roger Pau Monné wrote:
>>>> On Thu, Apr 02, 2020 at 04:06:06AM +0800, Chao Gao wrote:
>>>>> --- a/xen/drivers/passthrough/vtd/qinval.c
>>>>> +++ b/xen/drivers/passthrough/vtd/qinval.c
>>>>> @@ -442,6 +442,23 @@ int enable_qinval(struct vtd_iommu *iommu)
>>>>>  return 0;
>>>>>  }
>>>>>  
>>>>> +static int vtd_flush_context_noop(struct vtd_iommu *iommu, uint16_t did,
>>>>> +  uint16_t source_id, uint8_t 
>>>>> function_mask,
>>>>> +  uint64_t type, bool 
>>>>> flush_non_present_entry)
>>>>> +{
>>>>> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush CONTEXT.\n");
>>>>> +return -EIO;
>>>>> +}
>>>>> +
>>>>> +static int vtd_flush_iotlb_noop(struct vtd_iommu *iommu, uint16_t did,
>>>>> +uint64_t addr, unsigned int size_order,
>>>>> +uint64_t type, bool 
>>>>> flush_non_present_entry,
>>>>> +bool flush_dev_iotlb)
>>>>> +{
>>>>> +dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: Cannot flush IOTLB.\n");
>>>>> +return -EIO;
>>>>> +}
>>>>
>>>> I think I would add an ASSERT_UNREACHABLE() to both noop handlers
>>>> above, as I would expect trying to use them without the proper mode
>>>> being configured would point to an error elsewhere?
>>>
>>> If such an assertion triggered e.g. during S3 suspend/resume, it may
>>> lead to the box simply not doing anything useful, without there being
>>> any way to know what went wrong. If instead the system at least
>>> managed to resume, the log message could be observed.
>> 
>> Oh, OK then. I'm simply worried that people might ignore such one line
>> messages, maybe add a WARN?
>
>Hmm, yes, perhaps - would allow seeing right away where the call
>came from. Chao, I'd again be fine to flip the dprintk()-s to
>WARN()-s while committing. But of course only provided you and
>Kevin (as the maintainer) agree.

Sure, please go ahead.

Thanks
Chao

Re: [PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Chao Gao

On Tue, Apr 20, 2021 at 01:38:26PM +0200, Jan Beulich wrote:
>On 01.04.2020 22:06, Chao Gao wrote:
>> According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
>> isn't supported by Intel VT-d version 6 and beyond.
>> 
>> This hardware change impacts following two scenarios: admin can disable
>> queued invalidation via 'qinval' cmdline and use register-based interface;
>> VT-d switches to register-based invalidation when queued invalidation needs
>> to be disabled, for example, during disabling x2apic or during system
>> suspension or after enabling queued invalidation fails.
>> 
>> To deal with this hardware change, if register-based invalidation isn't
>> supported, queued invalidation cannot be disabled through Xen cmdline; and
>> if queued invalidation has to be disabled temporarily in some scenarios,
>> VT-d won't switch to register-based interface but use some dummy functions
>> to catch errors in case there is any invalidation request issued when queued
>> invalidation is disabled.
>> 
>> Signed-off-by: Chao Gao 
>
>In principle (with a minor nit further down)
>Reviewed-by: Jan Beulich 
>
>However, ...
>
>> ---
>> Changes in v2:
>>  - verify system suspension and resumption with this patch applied
>>  - don't fall back to register-based interface if enabling qinval failed.
>>see the change in init_vtd_hw().
>>  - remove unnecessary "queued_inval_supported" variable
>>  - constify the "struct vtd_iommu *" of has_register_based_invalidation()
>>  - coding-style changes
>
>... while this suggests this is v2 of a recently sent patch, the
>submission is dated a little over a year ago. This is confusing.
>It is additionally confusing that there were two copies of it in
>my inbox, despite mails coming from a list normally getting
>de-duplicated somewhere at our end (I believe).

You are right. I messed the system time of my server somehow. Sorry for this.
If it is possible, please help to update the date of this patch also.

>
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>>  
>>  iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
>>  iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
>> +iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
>> +
>> +if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
>> +{
>> +printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
>> Invalidation.\n",
>> +   iommu->index);
>
>Here (and at least once more yet further down): We don't normally end
>log messages with a full stop. Easily addressable while committing, of
>course.

Okay. Please go ahead.

Thanks
Chao

[PATCH v2] VT-d: Don't assume register-based invalidation is always supported

2021-04-20 Thread Chao Gao

According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
isn't supported by Intel VT-d version 6 and beyond.

This hardware change impacts following two scenarios: admin can disable
queued invalidation via 'qinval' cmdline and use register-based interface;
VT-d switches to register-based invalidation when queued invalidation needs
to be disabled, for example, during disabling x2apic or during system
suspension or after enabling queued invalidation fails.

To deal with this hardware change, if register-based invalidation isn't
supported, queued invalidation cannot be disabled through Xen cmdline; and
if queued invalidation has to be disabled temporarily in some scenarios,
VT-d won't switch to register-based interface but use some dummy functions
to catch errors in case there is any invalidation request issued when queued
invalidation is disabled.

Signed-off-by: Chao Gao 
---
Changes in v2:
 - verify system suspension and resumption with this patch applied
 - don't fall back to register-based interface if enabling qinval failed.
   see the change in init_vtd_hw().
 - remove unnecessary "queued_inval_supported" variable
 - constify the "struct vtd_iommu *" of has_register_based_invalidation()
 - coding-style changes
---
 docs/misc/xen-command-line.pandoc|  4 +++-
 xen/drivers/passthrough/vtd/iommu.c  | 27 +--
 xen/drivers/passthrough/vtd/iommu.h  |  7 ++
 xen/drivers/passthrough/vtd/qinval.c | 33 ++--
 4 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index deef6d0b4c..4ff4a87844 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1442,7 +1442,9 @@ The following options are specific to Intel VT-d hardware:
 *   The `qinval` boolean controls the Queued Invalidation sub-feature, and is
 active by default on compatible hardware.  Queued Invalidation is a
 feature in second-generation IOMMUs and is a functional prerequisite for
-Interrupt Remapping.
+Interrupt Remapping. Note that Xen disregards this setting for Intel VT-d
+version 6 and greater as Registered-Based Invalidation isn't supported
+by them.
 
 *   The `igfx` boolean is active by default, and controls whether the IOMMU in
 front of an Intel Graphics Device is enabled or not.
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 6428c8fe3e..94d1372903 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
 
 iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
 iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
+iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
+
+if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
+{
+printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
Invalidation.\n",
+   iommu->index);
+iommu_qinval = true;
+}
 
 if ( iommu_verbose )
 {
@@ -2141,6 +2149,10 @@ static int __must_check init_vtd_hw(bool resume)
  */
 if ( enable_qinval(iommu) != 0 )
 {
+/* Ensure register-based invalidation is available */
+if ( !has_register_based_invalidation(iommu) )
+return -EIO;
+
 iommu->flush.context = vtd_flush_context_reg;
 iommu->flush.iotlb   = vtd_flush_iotlb_reg;
 }
@@ -2231,6 +2243,7 @@ static int __init vtd_setup(void)
 struct acpi_drhd_unit *drhd;
 struct vtd_iommu *iommu;
 int ret;
+bool reg_inval_supported = true;
 
 if ( list_empty(_drhd_units) )
 {
@@ -2252,8 +2265,8 @@ static int __init vtd_setup(void)
 }
 
 /* We enable the following features only if they are supported by all VT-d
- * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
- * Remapping, and Posted Interrupt
+ * engines: Snoop Control, DMA passthrough, Register-based Invalidation,
+ * Queued Invalidation, Interrupt Remapping, and Posted Interrupt.
  */
 for_each_drhd_unit ( drhd )
 {
@@ -2275,6 +2288,9 @@ static int __init vtd_setup(void)
 if ( iommu_qinval && !ecap_queued_inval(iommu->ecap) )
 iommu_qinval = 0;
 
+if ( !has_register_based_invalidation(iommu) )
+reg_inval_supported = false;
+
 if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
 iommu_intremap = iommu_intremap_off;
 
@@ -2301,6 +2317,13 @@ static int __init vtd_setup(void)
 
 softirq_tasklet_init(_fault_tasklet, do_iommu_page_fault, NULL);
 
+if ( !iommu_qinval && !reg_inval_supported )
+{
+dprintk(XENLOG_ERR VTDPREFIX, "No available invalidation 
interface.\

Re: [RFC PATCH] VT-d: Don't assume register-based invalidation is always supported

2021-04-14 Thread Chao Gao

On Wed, Apr 14, 2021 at 12:07:02PM +0200, Jan Beulich wrote:
>On 14.04.2021 02:55, Chao Gao wrote:
>> According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
>> isn't supported by Intel VT-d version 6 and beyond.
>> 
>> This hardware change impacts following two scenarios: admin can disable
>> queued invalidation via 'qinval' cmdline and use register-based interface;
>> VT-d switches to register-based invalidation when queued invalidation needs
>> to be disabled, for example, during disabling x2apic or during system
>> suspension.
>> 
>> To deal with this hardware change, if register-based invalidation isn't
>> supported, queued invalidation cannot be disabled through Xen cmdline; and
>> if queued invalidation has to be disabled temporarily in some scenarios,
>> VT-d won't switch to register-based interface but use some dummy functions
>> to catch errors in case there is any invalidation request issued when queued
>> invalidation is disabled.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> I only tested Xen boot with qinval/no-qinval. I also want to do system
>> suspension and resumption to see if any unexpected error. But I don't
>> know how to trigger them. Any recommendation?
>
>Iirc, if your distro doesn't support a proper interface for this, it's
>as simple as "echo mem >/sys/power/state".

Thanks. I will give it a try. And all your comments make a lot of sense.
Will fix all of them in the next version.

Chao

[RFC PATCH] VT-d: Don't assume register-based invalidation is always supported

2021-04-14 Thread Chao Gao

According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
isn't supported by Intel VT-d version 6 and beyond.

This hardware change impacts following two scenarios: admin can disable
queued invalidation via 'qinval' cmdline and use register-based interface;
VT-d switches to register-based invalidation when queued invalidation needs
to be disabled, for example, during disabling x2apic or during system
suspension.

To deal with this hardware change, if register-based invalidation isn't
supported, queued invalidation cannot be disabled through Xen cmdline; and
if queued invalidation has to be disabled temporarily in some scenarios,
VT-d won't switch to register-based interface but use some dummy functions
to catch errors in case there is any invalidation request issued when queued
invalidation is disabled.

Signed-off-by: Chao Gao 
---
I only tested Xen boot with qinval/no-qinval. I also want to do system
suspension and resumption to see if any unexpected error. But I don't
know how to trigger them. Any recommendation?
---
 docs/misc/xen-command-line.pandoc|  4 ++-
 xen/drivers/passthrough/vtd/iommu.c  | 40 +---
 xen/drivers/passthrough/vtd/iommu.h  |  7 +
 xen/drivers/passthrough/vtd/qinval.c | 33 +--
 4 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index deef6d0b4c..4ff4a87844 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1442,7 +1442,9 @@ The following options are specific to Intel VT-d hardware:
 *   The `qinval` boolean controls the Queued Invalidation sub-feature, and is
 active by default on compatible hardware.  Queued Invalidation is a
 feature in second-generation IOMMUs and is a functional prerequisite for
-Interrupt Remapping.
+Interrupt Remapping. Note that Xen disregards this setting for Intel VT-d
+version 6 and greater as Registered-Based Invalidation isn't supported
+by them.
 
 *   The `igfx` boolean is active by default, and controls whether the IOMMU in
 front of an Intel Graphics Device is enabled or not.
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 6428c8fe3e..e738d04543 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1193,6 +1193,14 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
 
 iommu->cap = dmar_readq(iommu->reg, DMAR_CAP_REG);
 iommu->ecap = dmar_readq(iommu->reg, DMAR_ECAP_REG);
+iommu->version = dmar_readl(iommu->reg, DMAR_VER_REG);
+
+if ( !iommu_qinval && !has_register_based_invalidation(iommu) )
+{
+printk(XENLOG_WARNING VTDPREFIX "IOMMU %d: cannot disable Queued 
Invalidation.\n",
+   iommu->index);
+iommu_qinval = true;
+}
 
 if ( iommu_verbose )
 {
@@ -2231,6 +2239,8 @@ static int __init vtd_setup(void)
 struct acpi_drhd_unit *drhd;
 struct vtd_iommu *iommu;
 int ret;
+bool queued_inval_supported = true;
+bool reg_inval_supported = true;
 
 if ( list_empty(_drhd_units) )
 {
@@ -2252,8 +2262,8 @@ static int __init vtd_setup(void)
 }
 
 /* We enable the following features only if they are supported by all VT-d
- * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
- * Remapping, and Posted Interrupt
+ * engines: Snoop Control, DMA passthrough, Register-based Invalidation,
+ * Queued Invalidation, Interrupt Remapping, and Posted Interrupt.
  */
 for_each_drhd_unit ( drhd )
 {
@@ -2272,8 +2282,11 @@ static int __init vtd_setup(void)
 if ( iommu_hwdom_passthrough && !ecap_pass_thru(iommu->ecap) )
 iommu_hwdom_passthrough = false;
 
-if ( iommu_qinval && !ecap_queued_inval(iommu->ecap) )
-iommu_qinval = 0;
+if ( reg_inval_supported && !has_register_based_invalidation(iommu) )
+reg_inval_supported = false;
+
+if ( queued_inval_supported && !ecap_queued_inval(iommu->ecap) )
+queued_inval_supported = false;
 
 if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
 iommu_intremap = iommu_intremap_off;
@@ -2301,6 +2314,25 @@ static int __init vtd_setup(void)
 
 softirq_tasklet_init(_fault_tasklet, do_iommu_page_fault, NULL);
 
+if ( !queued_inval_supported && !reg_inval_supported )
+{
+dprintk(XENLOG_ERR VTDPREFIX, "No available invalidation 
interface.\n");
+ret = -ENODEV;
+goto error;
+}
+
+/*
+ * We cannot have !iommu_qinval && !reg_inval_supported here since
+ * iommu_qinval is set in iommu_alloc() if any iommu doesn't support
+ * Register-based Invalidation.
+ */
+if ( iommu_qinval && !queued_inval_supported

Re: [Xen-devel] [BUG]Nested virtualization, Xen on KVM, Xen cannot boot up as a guest of KVM

2020-02-25 Thread Chao Gao

On Wed, Feb 26, 2020 at 02:21:25PM +0800, Chen, Farrah wrote:
>Description:
>
>Nested virtualization, take KVM host as L0, create guest on it, install Xen on
>guest, then guest cannot boot up from Xen and keep rebooting.
>
> 
>
>Reproduce steps:
>1. Enable KVM nested on host(L0)
>rmmod kvm_intel
>modprobe kvm_intel nested=y
>cat /sys/module/kvm_intel/parameters/nested
>Y
>
>2.Create L1 guest via qemu:
>qemu-system-x86_64 -accel kvm -cpu host -drive file=rhel8.img,if=none,id=
>virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0 -m 7168 -smp 8 -monitor
>pty -cpu host -device virtio-net-pci,netdev=nic0,mac=00:16:3e:72:5e:0a -netdev
>tap,id=nic0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper,vhost=on -serial
>stdio
>
>3. Build and install Xen on L1 guest
>
>4. Reboot L1 and make it boot from Xen
>
> 
>
>Then L1 keep rebooting, full log attached.
>
>……
>
>(XEN) Running stub recovery selftests...
>
>(XEN) traps.c:1590: GPF (): 82d0bfffe041 [82d0bfffe041] ->
>82d08038e40c
>
>(XEN) traps.c:785: Trap 12: 82d0bfffe040 [82d0bfffe040] ->
>82d08038e40c
>
>(XEN) traps.c:1124: Trap 3: 82d0bfffe041 [82d0bfffe041] ->
>82d08038e40c
>
>(XEN) [ Xen-4.14-unstable  x86_64  debug=y   Tainted:  C   ]
>
>(XEN) CPU:0
>
>(XEN) RIP:e008:[] core2_vpmu_init+0xa5/0x221
>
>(XEN) RFLAGS: 00010202   CONTEXT: hypervisor
>
>(XEN) rax: 08300802   rbx:    rcx: 0345
>
>(XEN) rdx: 0004   rsi: 000a   rdi: 0063
>
>(XEN) rbp: 82d0804b7d68   rsp: 82d0804b7d58   r8:  0004
>
>(XEN) r9:  0008   r10: 82d0805effe0   r11: 0032
>
>(XEN) r12: 0002   r13: 0008   r14: 82d0805dd0c0
>
>(XEN) r15: 82d0805de300   cr0: 8005003b   cr4: 003526e0
>
>(XEN) cr3: bfca4000   cr2: 
>
>(XEN) fsb:    gsb:    gss: 
>
>(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
>
>(XEN) Xen code around  (core2_vpmu_init+0xa5/0x221):
>
>(XEN)  00 06 00 b9 45 03 00 00 <0f> 32 48 89 c1 48 c1 e9 0d 83 e1 01 88 0d 32
>00

The machine code above shows that #GP is generated when xen is reading
MSR_IA32_PERF_CAPABILITIES. In a KVM guest without Xen, cpuid tells that
perfmon isn't supported.

# ./cpuid -1 |grep "perfmon and debug"
 PDCM: perfmon and debug = false

So, it looks core2_vpmu_init() lacks a check to ensure the MSR is supported.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2020-01-17 Thread Chao Gao

On Fri, Jan 17, 2020 at 01:57:43PM -0500, Rich Persaud wrote:
>On Sep 26, 2019, at 06:17, Pasi Kärkkäinen  wrote:
>> 
>> Hello Stanislav,
>> 
>>> On Fri, Sep 13, 2019 at 11:28:20PM +0800, Chao Gao wrote:
>>>> On Fri, Sep 13, 2019 at 10:02:24AM +, Spassov, Stanislav wrote:
>>>> On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote:
>>>>> On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
>>>>>>>>> On 13.12.18 at 04:46,  wrote:
>>>>>>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>>>>>>>> On 12.12.18 at 16:18,  wrote:
>>>>>>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>>>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>>>>>>> On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>>>>>>>> I find some pass-thru devices don't work any more across guest 
>>>>>>>>>>>>>> reboot.
>>>>>>>>>>>>>> Assigning it to another guest also meets the same issue. And the 
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>> way to make it work again is un-binding and binding it to 
>>>>>>>>>>>>>> pciback.
>>>>>>>>>>>>>> Someone reported this issue one year ago [1]. More detail also 
>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>> found in [2].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>>>>>>>> during reboot or re-assignment. In the above case, Xen set 
>>>>>>>>>>>>>> maskall bit
>>>>>>>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal 
>>>>>>>>>>>>>> maskall
>>>>>>>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>>>>>>>> MSI-X message control register.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting 
>>>>>>>>>>>>>> MSI-X
>>>>>>>>>>>>>> internal state of a device, we employ it to fix this issue 
>>>>>>>>>>>>>> rather than
>>>>>>>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping 
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>> the device's msix and pirq has been created. This limitation 
>>>>>>>>>>>>>> prevents
>>>>>>>>>>>>>> us calling this function when detaching a device from a guest 
>>>>>>>>>>>>>> during
>>>>>>>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm also wondering why the release can't be done when the device 
>>>>>>>>>>>>> is
>>>>>>>>>>>>> detached from the guest (or the guest has been shut down). This 
>>>>>>>>>>>>> makes
>>>>>>>>>>>>> me worry about

Re: [Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-26 Thread Chao Gao

On Tue, Nov 26, 2019 at 03:41:53PM +, Sergey Dyasli wrote:
>Currently if a user tries to live-load the same or older ucode revision
>than CPU already has, he will get a single message in Xen log like:
>
>(XEN) 128 cores are to update their microcode
>
>No actual ucode loading will happen and this situation can be quite
>confusing. Fix this by starting ucode update only when the provided
>ucode revision is higher than the currently cached one (if any).
>This is based on the property that if microcode_cache exists, all CPUs
>in the system should have at least that ucode revision.
>
>Additionally, print a user friendly message if no matching or newer
>ucode can be found in the provided blob. This also requires ignoring
>-ENODATA in AMD-side code, otherwise the message given to the user is:
>
>(XEN) Parsing microcode blob error -61
>
>Which actually means that a ucode blob was parsed fine, but no matching
>ucode was found.
>
>Signed-off-by: Sergey Dyasli 
>---
>v2 --> v3:
>- move ucode comparison to generic code
>- ignore -ENODATA in a different code section
>
>v1 --> v2:
>- compare provided ucode with the currently cached one
>
>CC: Jan Beulich 
>CC: Andrew Cooper 
>CC: Roger Pau Monné 
>CC: Chao Gao 
>CC: Juergen Gross 
>---
> xen/arch/x86/microcode.c | 19 +++
> xen/arch/x86/microcode_amd.c |  7 +++
> 2 files changed, 26 insertions(+)
>
>diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>index 65d1f41e7c..44efc2d9b3 100644
>--- a/xen/arch/x86/microcode.c
>+++ b/xen/arch/x86/microcode.c
>@@ -640,10 +640,29 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
>buf, unsigned long len)
> 
> if ( !patch )
> {
>+printk(XENLOG_WARNING "microcode: couldn't find any matching ucode in 
>"
>+  "the provided blob!\n");
> ret = -ENOENT;
> goto put;
> }
> 
>+/*
>+ * If microcode_cache exists, all CPUs in the system should have at least
>+ * that ucode revision.
>+ */
>+spin_lock(_mutex);
>+if ( microcode_cache &&
>+ microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
>+{
>+        spin_unlock(_mutex);
>+printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
>+  "in the provided blob!\n");

The patch needs to be freed.

With it fixed,
Reviewed-by: Chao Gao 

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-24 Thread Chao Gao

On Fri, Nov 22, 2019 at 04:47:23PM +, Sergey Dyasli wrote:
>Currently if a user tries to live-load the same or older ucode revision
>than CPU already has, he will get a single message in Xen log like:
>
>(XEN) 128 cores are to update their microcode
>
>No actual ucode loading will happen and this situation can be quite
>confusing. Fix this by starting ucode update only when the provided
>ucode revision is higher than the currently cached one (if any).
>This is based on the property that if microcode_cache exists, all CPUs
>in the system should have at least that ucode revision.
>
>Additionally, print a user friendly message if no newer ucode can be
>found in the provided blob. This also requires ignoring -ENODATA in
>AMD-side code, otherwise the message given to the user is:
>
>(XEN) Parsing microcode blob error -61
>
>Which actually means that a ucode blob was parsed fine, but no matching
>ucode was found.
>
>Signed-off-by: Sergey Dyasli 

Reviewed-by: Chao Gao 

I wonder whether it is better to put the comparison ...

>---
>v1 --> v2:
>- compare provided ucode with the currently cached one
>
>CC: Jan Beulich 
>CC: Andrew Cooper 
>CC: Roger Pau Monné 
>CC: Chao Gao 
>CC: Juergen Gross 
>---
> xen/arch/x86/microcode.c| 12 ++--
> xen/arch/x86/microcode_amd.c| 14 ++
> xen/arch/x86/microcode_intel.c  | 12 +---
> xen/include/asm-x86/microcode.h |  3 ++-
> 4 files changed, 31 insertions(+), 10 deletions(-)
>
>diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>index 65d1f41e7c..dcd2c3ff77 100644
>--- a/xen/arch/x86/microcode.c
>+++ b/xen/arch/x86/microcode.c
>@@ -266,10 +266,16 @@ static const struct microcode_patch *nmi_patch = 
>ZERO_BLOCK_PTR;
>  */
> static struct microcode_patch *parse_blob(const char *buf, size_t len)
> {
>+struct microcode_patch *ret = NULL;
>+
> if ( likely(!microcode_ops->collect_cpu_info(_cpu(cpu_sig))) )
>-return microcode_ops->cpu_request_microcode(buf, len);
>+{
>+spin_lock(_mutex);
>+ret = microcode_ops->cpu_request_microcode(buf, len, microcode_cache);
>+spin_unlock(_mutex);
>+}
> 
>-return NULL;
>+return ret;
> }
> 
> void microcode_free_patch(struct microcode_patch *microcode_patch)
>@@ -641,6 +647,8 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
>buf, unsigned long len)
> if ( !patch )
> {
> ret = -ENOENT;
>+printk(XENLOG_WARNING "microcode: couldn't find any newer revision in 
>"
>+  "the provided blob!\n");
> goto put;
> }

... after this if(). Then you needn't modify any vendor-specific code.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-22 Thread Chao Gao

On Fri, Nov 22, 2019 at 12:19:41PM +0100, Jan Beulich wrote:
>On 22.11.2019 11:52, Sergey Dyasli wrote:
>> Currently if a user tries to live-load the same ucode revision that CPU
>> already has, he will get a single message in Xen log like:
>> 
>> (XEN) 128 cores are to update their microcode
>> 
>> No actual ucode loading will happen and this situation can be quite
>> confusing. Fix this by starting ucode update only when a newer ucode
>> revision has been provided. This is based on an assumption that all CPUs
>> in the system have the same ucode revision. If that's not the case,
>> the system must be considered unstable.
>
>Unstable or not, I did specifically convince Chao to handle such
>systems, bringing them into better shape. I can only repeat that
>I actually have a system where on each socket firmware loads ucode
>only on the first core. I don't see why boot time loading and late
>loading should differ in behavior for such a system.

Yes. 

I tried to load an older ucode but also got the same message. So I think
an optimization can be done:
we can assume that if there is a microcode_cache, all CPUs
should have equal or newer revision than the microcode_cache. If
the patch to be loaded has equal or older revision, we can refuse to
load to avoid the heavy stop_machine().

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 2/2] microcode: reject late ucode loading if any core is parked

2019-11-21 Thread Chao Gao

On Thu, Nov 21, 2019 at 11:21:13AM +0100, Jan Beulich wrote:
>On 21.11.2019 00:05, Chao Gao wrote:
>> If a core with all of its thread being parked, late ucode loading
>> which currently only loads ucode on online threads would lead to
>> differing ucode revisions in the system. In general, keeping ucode
>> revision consistent would be less error-prone. To this end, if there
>> is a parked thread doesn't have an online sibling thread, late ucode
>> loading is rejected.
>
>I'm confused. I thought we had agreed that the long term solution
>would be to briefly bring online a thread of cores with all their
>threads parked.

I don't remeber that we reached such an aggrement. But if it happened,
I am really sorry for forgeting it.

Actually, I think Dom0 has the information (cpu topology and each cpu's
offline/online status) to decide if there is a parked core or not.
IMO, rejecting late loading in this case is just a defense check. Dom0
is able to correct the situation by bringing up some cpus.

>What you do here was meant to be a temporary step
>only for 4.13, for which it is too late now (unless Jürgen
>indicates otherwise).
>
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -584,6 +584,51 @@ static int do_microcode_update(void *patch)
>>  return ret;
>>  }
>>  
>> +static unsigned int unique_core_id(unsigned int cpu, unsigned int 
>> socket_shift)
>> +{
>> +unsigned int core_id = cpu_to_cu(cpu);
>> +
>> +if ( core_id == INVALID_CUID )
>> +core_id = cpu_to_core(cpu);
>> +
>> +return (cpu_to_socket(cpu) << socket_shift) + core_id;
>> +}
>> +
>> +static int has_parked_core(void)
>> +{
>> +int ret;
>> +unsigned int cpu, max_bits, core_width;
>> +unsigned int max_sockets = 1, max_cores = 1;
>> +unsigned long *bitmap;
>> +
>> +if ( !park_offline_cpus )
>> +return 0;
>> +
>> +for_each_parked_cpu(cpu)
>> +{
>> +/* Note that cpu_to_socket() get an ID starting from 0. */
>> +max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
>> +max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
>> +}
>> +
>> +core_width = fls(max_cores);
>> +max_bits = max_sockets << core_width;
>
>Isn't this off by one? If max_cores is 1, you don't need to shift
>max_sockets (or the cpu_to_socket() value in unique_core_id()) at
>all, for example.
>
>With this in mind, instead of the park_offline_cpus check at the
>top of the function, wouldn't it make sense to check here whether
>max_sockets and max_cores are both still 1, in which case at
>least one thread of the only core of the only socket in the system
>is obviously still online (the one we're running on)?

Agree. Will follow your suggestion.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 1/2] x86/cpu: maintain a parked CPU bitmap

2019-11-21 Thread Chao Gao

On Thu, Nov 21, 2019 at 11:02:10AM +0100, Jan Beulich wrote:
>On 21.11.2019 10:47, Julien Grall wrote:
>> On 20/11/2019 23:05, Chao Gao wrote:
>>> --- a/xen/arch/arm/smpboot.c
>>> +++ b/xen/arch/arm/smpboot.c
>>> @@ -39,6 +39,7 @@
>>>   cpumask_t cpu_online_map;
>>>   cpumask_t cpu_present_map;
>>>   cpumask_t cpu_possible_map;
>>> +cpumask_var_t cpu_parked_map;
>> 
>> You define cpu_parked_map but AFAIK it will never get allocated. The 
>> risk here is any access to that variable will result to a fault.
>> 
>> Looking at the changes below, it looks like access in common code will 
>> be protected by park_offline_cpus. This is always false on Arm, so the 
>> compiler should remove any access to cpu_parked_map.
>> 
>> With that in mind, I think it would be best to only provide a prototype 
>> for cpu_parked_map and so the linker can warn if someone used it.
>
>+1

Will do. I added this because I am not sure all compilers would omit
such access.

>
>In fact I wonder whether the maintenance of the map should live in
>common code in the first place. While clearing the respective bit
>in cpu_up() looks correct (and could be done without any if()),

But when park_offline_cpus() is false, the map isn't allocated. I don't
think it is safe to access the map in this case.

>I'm not convinced the setting of the bit in cpu_down() is going to
>be correct in all cases.

Do you mean in some cases, cpu_down() is to really offline a CPU even
park_offline_cpus is set? And in this case, setting the bit isn't
correct.

If yes, one thing confuses me is that cpu_down() would call
cpu_notifier_call_chain() several times unconditionally. And registered
callbacks take actions depending on the value of park_offline_cpus.
I expected that callbacks would do more check to avoid parking a CPU
in some cases.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 1/2] x86/cpu: maintain a parked CPU bitmap

2019-11-20 Thread Chao Gao

It helps to distinguish parked CPUs from those are really offlined or
hot-added. We need to know the parked CPUs in order to do a special
check against them to ensure that all CPUs, except those are really
offlined or hot-added, have the same ucode version.

Signed-off-by: Chao Gao 
---
Note that changes on ARM side are untested.
---
 xen/arch/arm/smpboot.c| 1 +
 xen/arch/x86/cpu/common.c | 4 
 xen/arch/x86/smpboot.c| 1 +
 xen/common/cpu.c  | 4 
 xen/include/asm-arm/smp.h | 1 +
 xen/include/asm-x86/smp.h | 1 +
 xen/include/xen/cpumask.h | 1 +
 7 files changed, 13 insertions(+)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 00b64c3..1b57ba4 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -39,6 +39,7 @@
 cpumask_t cpu_online_map;
 cpumask_t cpu_present_map;
 cpumask_t cpu_possible_map;
+cpumask_var_t cpu_parked_map;
 
 struct cpuinfo_arm cpu_data[NR_CPUS];
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 6c6bd63..fbb961d 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -337,7 +337,11 @@ void __init early_cpu_init(void)
}
 
if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
+   {
park_offline_cpus = opt_mce;
+   if (park_offline_cpus && !zalloc_cpumask_var(_parked_map))
+   panic("No memory for CPU parked map\n");
+   }
 
initialize_cpu_data(0);
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index fa691b6..f66de8d 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -60,6 +60,7 @@ cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
 bool __read_mostly park_offline_cpus;
+cpumask_var_t cpu_parked_map;
 
 unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 66c855c..0090a19 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -117,6 +117,8 @@ int cpu_down(unsigned int cpu)
 cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true);
 
 send_global_virq(VIRQ_PCPU_STATE);
+if ( park_offline_cpus )
+cpumask_set_cpu(cpu, cpu_parked_map);
 cpu_hotplug_done();
 return 0;
 
@@ -154,6 +156,8 @@ int cpu_up(unsigned int cpu)
 cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true);
 
 send_global_virq(VIRQ_PCPU_STATE);
+if ( park_offline_cpus )
+cpumask_clear_cpu(cpu, cpu_parked_map);
 
 cpu_hotplug_done();
 return 0;
diff --git a/xen/include/asm-arm/smp.h b/xen/include/asm-arm/smp.h
index fdbcefa..4b392fa 100644
--- a/xen/include/asm-arm/smp.h
+++ b/xen/include/asm-arm/smp.h
@@ -19,6 +19,7 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
  * would otherwise prefer them to be off?
  */
 #define park_offline_cpus false
+extern cpumask_var_t cpu_parked_map;
 
 extern void noreturn stop_cpu(void);
 
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index dbeed2f..886737d 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -31,6 +31,7 @@ DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
  * would otherwise prefer them to be off?
  */
 extern bool park_offline_cpus;
+extern cpumask_var_t cpu_parked_map;
 
 void smp_send_nmi_allbutself(void);
 
diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h
index 256b60b..543cec5 100644
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -457,6 +457,7 @@ extern cpumask_t cpu_present_map;
 #define for_each_possible_cpu(cpu) for_each_cpu(cpu, _possible_map)
 #define for_each_online_cpu(cpu)   for_each_cpu(cpu, _online_map)
 #define for_each_present_cpu(cpu)  for_each_cpu(cpu, _present_map)
+#define for_each_parked_cpu(cpu)   for_each_cpu(cpu, cpu_parked_map)
 
 /* Copy to/from cpumap provided by control tools. */
 struct xenctl_bitmap;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 2/2] microcode: reject late ucode loading if any core is parked

2019-11-20 Thread Chao Gao

If a core with all of its thread being parked, late ucode loading
which currently only loads ucode on online threads would lead to
differing ucode revisions in the system. In general, keeping ucode
revision consistent would be less error-prone. To this end, if there
is a parked thread doesn't have an online sibling thread, late ucode
loading is rejected.

Two threads are on the same core or computing unit iff they have
the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
is generated for each thread. And use a bitmap to reduce the
number of comparison.

Signed-off-by: Chao Gao 
---
Changes:
 - traverse the new parked cpu bitmap to find a parked core. It avoids
 access uninitialized cpu_data of a hot-added CPU.
 - use bitmap_empty() rather than find_first_bit() to check whether a
 bitmap is empty.
---
 xen/arch/x86/microcode.c| 63 +
 xen/include/asm-x86/processor.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 65d1f41..dcc8e4b 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -584,6 +584,51 @@ static int do_microcode_update(void *patch)
 return ret;
 }
 
+static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
+{
+unsigned int core_id = cpu_to_cu(cpu);
+
+if ( core_id == INVALID_CUID )
+core_id = cpu_to_core(cpu);
+
+return (cpu_to_socket(cpu) << socket_shift) + core_id;
+}
+
+static int has_parked_core(void)
+{
+int ret;
+unsigned int cpu, max_bits, core_width;
+unsigned int max_sockets = 1, max_cores = 1;
+unsigned long *bitmap;
+
+if ( !park_offline_cpus )
+return 0;
+
+for_each_parked_cpu(cpu)
+{
+/* Note that cpu_to_socket() get an ID starting from 0. */
+max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
+max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
+}
+
+core_width = fls(max_cores);
+max_bits = max_sockets << core_width;
+bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(max_bits));
+if ( !bitmap )
+return -ENOMEM;
+
+for_each_parked_cpu(cpu)
+__set_bit(unique_core_id(cpu, core_width), bitmap);
+
+for_each_online_cpu(cpu)
+__clear_bit(unique_core_id(cpu, core_width), bitmap);
+
+ret = !bitmap_empty(bitmap, max_bits);
+xfree(bitmap);
+
+return ret;
+}
+
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
 int ret;
@@ -629,6 +674,24 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 return -EPERM;
 }
 
+/*
+ * If there is a core with all of its threads parked, late loading may
+ * cause differing ucode revisions in the system. Refuse this operation.
+ */
+ret = has_parked_core();
+if ( ret )
+{
+if ( ret > 0 )
+{
+printk(XENLOG_WARNING
+   "Aborted: found a parked core (parked CPU bitmap: %*pbl)\n",
+   CPUMASK_PR(cpu_parked_map));
+ret = -EPERM;
+}
+xfree(buffer);
+goto put;
+}
+
 patch = parse_blob(buffer, len);
 xfree(buffer);
 if ( IS_ERR(patch) )
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 557f9b6..f8a9e93 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -171,6 +171,7 @@ extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 
*c);
 
 #define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
 #define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+#define cpu_to_cu(_cpu) (cpu_data[_cpu].compute_unit_id)
 
 unsigned int apicid_to_socket(unsigned int);
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] libxl_pci: Fix guest shutdown with PCI PT attached

2019-10-15 Thread Chao Gao

On Tue, Oct 15, 2019 at 06:59:37PM +0200, Sander Eikelenboom wrote:
>On 14/10/2019 17:03, Chao Gao wrote:
>> On Thu, Oct 10, 2019 at 06:13:43PM +0200, Sander Eikelenboom wrote:
>>> On 01/10/2019 12:35, Anthony PERARD wrote:
>>>> Rewrite of the commit message:
>>>>
>>>> Before the problematic commit, libxl used to ignore error when
>>>> destroying (force == true) a passthrough device, especially error that
>>>> happens when dealing with the DM.
>>>>
>>>> Since fae4880c45fe, if the DM failed to detach the pci device within
>>>> the allowed time, the timed out error raised skip part of
>>>> pci_remove_*, but also raise the error up to the caller of
>>>> libxl__device_pci_destroy_all, libxl__destroy_domid, and thus the
>>>> destruction of the domain fails.
>>>>
>>>> In this patch, if the DM didn't confirmed that the device is removed,
>>>> we will print a warning and keep going if force=true.  The patch
>>>> reorder the functions so that pci_remove_timeout() calls
>>>> pci_remove_detatched() like it's done when DM calls are successful.
>>>>
>>>> We also clean the QMP states and associated timeouts earlier, as soon
>>>> as they are not needed anymore.
>>>>
>>>> Reported-by: Sander Eikelenboom 
>>>> Fixes: fae4880c45fe015e567afa223f78bf17a6d98e1b
>>>> Signed-off-by: Anthony PERARD 
>>>>
>>>
>>> Hi Anthony / Chao,
>>>
>>> I have to come back to this, a bit because perhaps there is an underlying 
>>> issue.
>>> While it earlier occurred to me that the VM to which I passed through most 
>>> pci-devices 
>>> (8 to be exact) became very slow to shutdown, but I  didn't investigate it 
>>> further.
>>>
>>> But after you commit messages from this patch it kept nagging, so today I 
>>> did some testing
>>> and bisecting.
>>>
>>> The difference in tear-down time at least from what the IOMMU code logs is 
>>> quite large:
>>>
>>> xen-4.12.0
>>> Setup:  7.452 s
>>> Tear-down:  7.626 s
>>>
>>> xen-unstable-ee7170822f1fc209f33feb47b268bab35541351d
>>> Setup:  7.468 s
>>> Tear-down: 50.239 s
>>>
>>> Bisection turned up:
>>> commit c4b1ef0f89aa6a74faa4618ce3efed1de246ec40
>>> Author: Chao Gao 
>>> Date:   Fri Jul 19 10:24:08 2019 +0100
>>> libxl_qmp: wait for completion of device removal
>>>
>>> Which makes me wonder if there is something going wrong in Qemu ?
> 
>> Hi Sander,
>Hi Chao,
>
>> 
>> Thanks for your testing and the bisection.
>> 
>> I tried on my machine, the destruction time of a guest with 8 pass-thru
>> devices increased from 4s to 12s after applied the commit above.
>
>To what patch are you referring Anthony's or 
>c4b1ef0f89aa6a74faa4618ce3efed1de246ec40 ?

The latter.

>
>> In my understanding, I guess you might get the error message "timed out
>> waiting for DM to remove...". There might be some issues on your assigned
>> devices' drivers. You can first unbind the devices with their drivers in
>> VM and then tear down the VM, and check whether the VM teardown gets
>> much faster.
>
>I get that error message when I test with Anthony's patch applied, the 
>destruction time with that patch is low.
>
>How ever my point was if that patch is correct in the sense that there seems 
>to be an underlying issue 
>which causes it to take so long. That issue was uncovered by 
>c4b1ef0f89aa6a74faa4618ce3efed1de246ec40, so I'm not
>saying that commit is wrong in any sense, it just uncovered another issue that 
>was already present,
>but hard to detect as we just didn't wait at destruction time (and thus the 
>same effect as a timeout).

Actually, it is introduced by c4b1ef0f89, though it did fix another
issue.

>
>One or the other way that was just a minor issue until 
>fae4880c45fe015e567afa223f78bf17a6d98e1b, where the long
>destruction time now caused the domain destruction to stall, which was then 
>fixed by Antony's patch, but that uses
>a timeout which kinds of circumvents the issue, instead of finding out where 
>is comes from and solve it there (
>if that is possible of course).
>
>And I wonder if Anthony's patch doesn't interfere with the case you made 
>c4b1ef0f89aa6a74faa4618ce3efed1de246ec40 for, 
>if you get the timeout error message as well, then that is kind of not waiting 
>for the destruction to finish, isn't it ?
>
>Chao, 
>could you perhaps test for me Xen with as latest commit 
>ee7170822f1fc209f33feb47b268bab35541351d ?
>That is before Anthony's patch series, but after your 
>c4b1ef0f89aa6a74faa4618ce3efed1de246ec40.

It's actually what I did. VM teardown with 8 pass-thru devices on my
side takes 12s which only took 4s without my patch.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] libxl_pci: Fix guest shutdown with PCI PT attached

2019-10-14 Thread Chao Gao

On Thu, Oct 10, 2019 at 06:13:43PM +0200, Sander Eikelenboom wrote:
>On 01/10/2019 12:35, Anthony PERARD wrote:
>> Rewrite of the commit message:
>> 
>> Before the problematic commit, libxl used to ignore error when
>> destroying (force == true) a passthrough device, especially error that
>> happens when dealing with the DM.
>> 
>> Since fae4880c45fe, if the DM failed to detach the pci device within
>> the allowed time, the timed out error raised skip part of
>> pci_remove_*, but also raise the error up to the caller of
>> libxl__device_pci_destroy_all, libxl__destroy_domid, and thus the
>> destruction of the domain fails.
>> 
>> In this patch, if the DM didn't confirmed that the device is removed,
>> we will print a warning and keep going if force=true.  The patch
>> reorder the functions so that pci_remove_timeout() calls
>> pci_remove_detatched() like it's done when DM calls are successful.
>> 
>> We also clean the QMP states and associated timeouts earlier, as soon
>> as they are not needed anymore.
>> 
>> Reported-by: Sander Eikelenboom 
>> Fixes: fae4880c45fe015e567afa223f78bf17a6d98e1b
>> Signed-off-by: Anthony PERARD 
>> 
>
>Hi Anthony / Chao,
>
>I have to come back to this, a bit because perhaps there is an underlying 
>issue.
>While it earlier occurred to me that the VM to which I passed through most 
>pci-devices 
>(8 to be exact) became very slow to shutdown, but I  didn't investigate it 
>further.
>
>But after you commit messages from this patch it kept nagging, so today I did 
>some testing
>and bisecting.
>
>The difference in tear-down time at least from what the IOMMU code logs is 
>quite large:
>
>xen-4.12.0
>   Setup:  7.452 s
>   Tear-down:  7.626 s
>
>xen-unstable-ee7170822f1fc209f33feb47b268bab35541351d
>   Setup:  7.468 s
>   Tear-down: 50.239 s
>
>Bisection turned up:
>   commit c4b1ef0f89aa6a74faa4618ce3efed1de246ec40
>   Author: Chao Gao 
>   Date:   Fri Jul 19 10:24:08 2019 +0100
>   libxl_qmp: wait for completion of device removal
>
>Which makes me wonder if there is something going wrong in Qemu ?

Hi Sander,

Thanks for your testing and the bisection.

I tried on my machine, the destruction time of a guest with 8 pass-thru
devices increased from 4s to 12s after applied the commit above. In my
understanding, I guess you might get the error message "timed out
waiting for DM to remove...". There might be some issues on your assigned
devices' drivers. You can first unbind the devices with their drivers in
VM and then tear down the VM, and check whether the VM teardown gets
much faster.

Anthony & Wei,

The commit above basically serializes and synchronizes detaching
assigned devices and thus increases VM teardown time significantly if
there are multiple assigned devices. The commit aimed to avoid qemu's
access to PCI configuration space coinciding with the device reset
initiated by xl (which is not desired and is exactly the case which
triggers the assertion in Xen [1]). I personally insist that xl should
wait for DM's completion of device detaching. Otherwise, besides Xen
panic (which can be fixed in another way), in theory, such sudden
unawared device reset might cause a disaster (e.g. data loss for a
storage device).

[1]: https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg03287.html

But considering fast creation and teardown is an important benefit of
virtualization, I am not sure how to deal with the situation. Anyway,
you can make the decision. To fix the regression on VM teardown, we can
revert the commit by removing the timeout logic.

What's your opinion?

Thanks
Chao

>
>--
>Sander
>
>
>
>xen-4.12.0 setup:
>   (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Disable: device id = 0x900, 
> domain = 0, paging mode = 3
>   (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Setup I/O page table: device id 
> = 0x900, type = 0x1, root table = 0x4aa847000, domain = 1, paging mode = 3
>   (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Re-assign :09:00.0 from 
> dom0 to dom1
>   ...
>   (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Disable: device id = 0x907, 
> domain = 0, paging mode = 3
>   (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Setup I/O page table: device id 
> = 0x907, type = 0x1, root table = 0x4aa847000, domain = 1, paging mode = 3
>   (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Re-assign :09:00.7 from 
> dom0 to dom1
>
>
>xen-4.12.0 tear-down:
>   (XEN) [2019-10-10 10:01:11.971] AMD-Vi: Disable: device id = 0x900, 
> domain = 1, paging mode = 3
>   (XEN) [2019-10-10 10:01:11.971] AMD-Vi: Setup I/O page table: device id 
> = 0x900, type = 0x1, root table = 0x53572c000, domain =

Re: [Xen-devel] [PATCH v2] pci: clear {host/guest}_maskall field on assign

2019-10-09 Thread Chao Gao

On Wed, Oct 09, 2019 at 10:33:21AM +0200, Roger Pau Monne wrote:
>The current implementation of host_maskall makes it sticky across
>assign and deassign calls, which means that once a guest forces Xen to
>set host_maskall the maskall bit is not going to be cleared until a
>call to PHYSDEVOP_prepare_msix is performed. Such call however
>shouldn't be part of the normal flow when doing PCI passthrough, and
>hence the flag needs to be cleared when assigning in order to prevent
>host_maskall being carried over from previous assignations.
>
>Note that the entry maskbit is reset when the msix capability is
>initialized, and the guest_maskall field is also cleared so that the
>hardware value matches Xen's internal state (hardware maskall =
>host_maskall | guest_maskall).
>
>Also note that doing the reset of host_maskall there would allow the
>guest to reset such field by enabling and disabling MSIX, which is not
>intended.
>
>Signed-off-by: Roger Pau Monné 
>---
>Cc: Chao Gao 
>Cc: "Spassov, Stanislav" 
>Cc: Pasi Kärkkäinen 
>---
>Chao, Stanislav, can you please check if this patch fixes your
>issues?

Tested-by: Chao Gao 

I got the assertion failure below when starting xencommons with the
newest staging:

Setting domain 0 name, domid and JSON config...
xen-init-dom0: _libxl_types.c:2163: libxl_domain_build_info_init_type: 
Assertion `p->type == LIBXL_DOMAIN_TYPE_INVALID' failed.
/etc/init.d/xencommons: line 54:  2006 Aborted (core dumped) 
${LIBEXEC_BIN}/xen-init-dom0 ${XEN_DOM0_UUID}

It should be irrelated to this patch. So I apply this patch on
cd93953538aac and it works.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] pci: clear host_maskall field on assign

2019-10-08 Thread Chao Gao

On Mon, Oct 07, 2019 at 09:38:48AM +0200, Jan Beulich wrote:
>On 05.10.2019 01:58, Chao Gao wrote:
>> On Wed, Oct 02, 2019 at 12:49:35PM +0200, Roger Pau Monne wrote:
>>> The current implementation of host_maskall makes it sticky across
>>> assign and deassign calls, which means that once a guest forces Xen to
>>> set host_maskall the maskall bit is not going to be cleared until a
>>> call to PHYSDEVOP_prepare_msix is performed. Such call however
>>> shouldn't be part of the normal flow when doing PCI passthrough, and
>>> hence the flag needs to be cleared when assigning in order to prevent
>>> host_maskall being carried over from previous assignations.
>>>
>>> Note that other mask fields, like guest_masked or the entry maskbit
>>> are already reset when the msix capability is initialized. Also note
>>> that doing the reset of host_maskall there would allow the guest to
>>> reset such field by enabling and disabling MSIX, which is not
>>> intended.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>> ---
>>> Cc: Chao Gao 
>>> Cc: "Spassov, Stanislav" 
>>> Cc: Pasi Kärkkäinen 
>>> ---
>>> Chao, Stanislav, can you please check if this patch fixes your
>>> issues?
>> 
>> I am glad to. For your testing, you can just kill qemu and destroy the
>> guest. Then maskall bit of a pass-thru device will be set. And in this
>> case, try to recreate the guest and check whether the maskall bit is
>> cleared in guest.
>> 
>> The solution is similar to my v1 [1]. One question IMO (IIRC, it is why
>> I changed to another approach) is: why not do such reset at deivce
>> deassignment such that dom0 can use a clean device. Otherwise, the
>> device won't work after being unbound from pciback. But I am not so
>> sure, I can check it next Tuesday.
>
>I too did think about this, but aiui pciback needs to issue
>PHYSDEVOP_release_msix anyway, and Dom0 would then re-setup MSI-X
>"from scratch", i.e. we'd clear the flag anyway in
>msix_capability_init() due to msix->used_entries being zero at
>the first (of possibly several) invocation(s).

Yes. I just checked it on my machine and found you are right.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] pci: clear host_maskall field on assign

2019-10-04 Thread Chao Gao

On Wed, Oct 02, 2019 at 12:49:35PM +0200, Roger Pau Monne wrote:
>The current implementation of host_maskall makes it sticky across
>assign and deassign calls, which means that once a guest forces Xen to
>set host_maskall the maskall bit is not going to be cleared until a
>call to PHYSDEVOP_prepare_msix is performed. Such call however
>shouldn't be part of the normal flow when doing PCI passthrough, and
>hence the flag needs to be cleared when assigning in order to prevent
>host_maskall being carried over from previous assignations.
>
>Note that other mask fields, like guest_masked or the entry maskbit
>are already reset when the msix capability is initialized. Also note
>that doing the reset of host_maskall there would allow the guest to
>reset such field by enabling and disabling MSIX, which is not
>intended.
>
>Signed-off-by: Roger Pau Monné 
>---
>Cc: Chao Gao 
>Cc: "Spassov, Stanislav" 
>Cc: Pasi Kärkkäinen 
>---
>Chao, Stanislav, can you please check if this patch fixes your
>issues?

I am glad to. For your testing, you can just kill qemu and destroy the
guest. Then maskall bit of a pass-thru device will be set. And in this
case, try to recreate the guest and check whether the maskall bit is
cleared in guest.

The solution is similar to my v1 [1]. One question IMO (IIRC, it is why
I changed to another approach) is: why not do such reset at deivce
deassignment such that dom0 can use a clean device. Otherwise, the
device won't work after being unbound from pciback. But I am not so
sure, I can check it next Tuesday.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg00863.html

Thanks
Chao

>---
> xen/drivers/passthrough/pci.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>index 7deef2f12b..b4f1ac2dd9 100644
>--- a/xen/drivers/passthrough/pci.c
>+++ b/xen/drivers/passthrough/pci.c
>@@ -1504,7 +1504,10 @@ static int assign_device(struct domain *d, u16 seg, u8 
>bus, u8 devfn, u32 flag)
> }
> 
> if ( pdev->msix )
>+{
> msixtbl_init(d);
>+pdev->msix->host_maskall = false;
>+}

It is similar to my v1 patch here.
[1]: https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg00863.html

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-30 Thread Chao Gao

On Mon, Sep 30, 2019 at 11:18:05AM +0200, Jan Beulich wrote:
>On 29.09.2019 23:24, Chao Gao wrote:
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev 
>> *pdev, unsigned int reg,
>>  pos = entry ? entry->msi_attrib.pos
>>  : pci_find_cap_offset(seg, bus, slot, func,
>>PCI_CAP_ID_MSIX);
>> -ASSERT(pos);
>> +if ( unlikely(!pos) )
>> +{
>> +printk_once(XENLOG_WARNING
>> +"%04x:%02x:%02x.%u MSI-X capability is missing\n",
>> +seg, bus, slot, func);
>> +return -EAGAIN;
>> +}
>
>Besides agreeing with Roger's comments, whose access do we
>intercept here at the time you observe the operation above
>producing a zero "pos"? If it's Dom0, then surely there's a bug
>in Dom0 doing the access in the first place when a reset hasn't
>completed yet?
>If it's a DomU, then is the reset happening
>behind _its_ back as well (which is not going to end well)?

Looks like it is Dom0. Xen should defend against Dom0 bugs, right?

Here is the call trace:
(XEN) memory_map:remove: dom1 gfn=f mfn=de000 nr=2000
(XEN) memory_map:remove: dom1 gfn=f4051 mfn=e0001 nr=3
(XEN) Assertion 'pos' failed at msi.c:1311
(XEN) ---[ Xen-4.13-unstable  x86_64  debug=y   Tainted:  C   ]---
(XEN) CPU:38
(XEN) RIP:e008:[] pci_msi_conf_write_intercept+0xd7/0x216
(XEN) RFLAGS: 00010246   CONTEXT: hypervisor (d0v1)
(XEN) rax:    rbx: 83087a446c50   rcx: 
(XEN) rdx: 830863c57fff   rsi: 0293   rdi: 82d080498ee0
(XEN) rbp: 830863c579e0   rsp: 830863c579b0   r8:  
(XEN) r9:  830863692ae0   r10:    r11: 
(XEN) r12: 00b2   r13: 830863c57a64   r14: 
(XEN) r15: 0089   cr0: 80050033   cr4: 003426e0
(XEN) cr3: 000812052000   cr2: 557d51fbc000
(XEN) fsb: 7f05f2caa400   gsb: 888194a4   gss: 
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen code around  
(pci_msi_conf_write_intercept+0xd7/0x216):
(XEN)  00 e8 d0 26 fd ff eb 85 <0f> 0b ba 05 00 00 00 be ff ff ff ff 48 89 df e8
(XEN) Xen stack trace from rsp=830863c579b0:
(XEN)000257be 8900  0002
(XEN)830863c57a64 00b2 830863c57a18 82d080297d99
(XEN)8308636bb000 830863c57a64 00b2 0002
(XEN)8900 830863c57a50 82d08037d40b 0cfe
(XEN)0002 0002 8308636bb000 830863c57a64
(XEN)830863c57a90 82d08037d5af 7fff8022854f 830863c57e30
(XEN)0002 0cfe 83086369c000 8308636bb000
(XEN)830863c57ad0 82d08037db65 7fff 0cfe
(XEN)830863c57e30 82d0803fb7c0  
(XEN)830863c57de8 82d0802bf35d 0004 
(XEN)82d080387800 00ef00ef 830863c57bc0 82d7
(XEN)82d0 00ef 8305a473ae70 
(XEN)830863c57b20 82cff000 0282 830863c57b60
(XEN)82d08023c27d 8305a473ae60 830863c57ba0 82d080248596
(XEN)00020040 8305a473ae60 0086 830863c57ba0
(XEN)82d08023c27d 0286 830863c57bb8 0040
(XEN)830863c57bc8 82d08026c747  
(XEN)   
(XEN)  830863c57da0 
(XEN)0003   80868086
(XEN) Xen call trace:
(XEN)[] pci_msi_conf_write_intercept+0xd7/0x216
(XEN)[] pci_conf_write_intercept+0x68/0x72
(XEN)[] emul-priv-op.c#pci_cfg_ok+0xb5/0x146
(XEN)[] emul-priv-op.c#guest_io_write+0x113/0x20b
(XEN)[] emul-priv-op.c#write_io+0xda/0xe4
(XEN)[] x86_emulate+0x11cf7/0x3169d
(XEN)[] x86_emulate_wrapper+0x26/0x5f
(XEN)[] pv_emulate_privileged_op+0x150/0x271
(XEN)[] do_general_protection+0x20b/0x257
(XEN)[] x86_64/entry.S#handle_exception_saved+0x68/0x94

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-30 Thread Chao Gao

On Mon, Sep 30, 2019 at 11:09:58AM +0200, Roger Pau Monné wrote:
>On Mon, Sep 30, 2019 at 05:24:31AM +0800, Chao Gao wrote:
>> Current, Xen isn't aware of device reset (initiated by dom0). Xen may
>> access the device while device cannot respond to config requests
>> normally (e.g.  after device reset, device may respond to config
>> requests with CRS completions to indicate it needs more time to
>> complete a reset, refer to pci_dev_wait() in linux kernel for more
>> detail). Here, don't assume msix capability is always visible and
>> return -EAGAIN to the caller.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> I didn't find a way to trigger the assertion in normal usages.
>> It is found by an internal test: echo 1 to /sys/bus/pci//reset
>> when the device is being used by a guest. Although the test is a
>> little insane, it is better to avoid crashing Xen even for this case.
>
>The hardware domain doing such things behind Xen's back is quite
>likely to end badly, either hitting an ASSERT somewhere or with a
>malfunctioning device. Xen should be signaled of when such reset is
>happening, so it can also tear down the internal state of the
>device.
>
>Xen could trap accesses to the FLR bit in order to detect device
>resets, but that's only a way of performing a device reset, other
>methods are likely more complicated to detect, and hence this would
>only be a partial solution.
>
>Have you considered whether it's feasible to signal Xen that a device
>reset is happening, so it can torn down the internal device state?

I think it is feasible. But I am not sure whether it is necessary.
As you said to me before, after detaching the device from a domain,
the internal device state in Xen should have be reset. That's why
hardware domain or other domainU can use the device again. So Xen
has provided hypercalls to tear down the internal state. (IMO, the
internal state includes interrupt binding and mapping, MMIO mapping.
But I am not sure if I miss something).

The question then becomes: should Xen tolerate hardware domain's
misbehavior (resetting a device without tearing down internal state)
or just panic?

>
>> ---
>>  xen/arch/x86/msi.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 76d4034..e2f3c6c 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev 
>> *pdev, unsigned int reg,
>>  pos = entry ? entry->msi_attrib.pos
>>  : pci_find_cap_offset(seg, bus, slot, func,
>>PCI_CAP_ID_MSIX);
>> -ASSERT(pos);
>
>I think at least a comment should be added here describing why a
>capability might suddenly disappear.

Will do.

>
>> +if ( unlikely(!pos) )
>> +{
>> +printk_once(XENLOG_WARNING
>
>I'm not sure if printk_once is the best option, the message would be
>printed only once, and for the first device that hits this. Ideally I
>think it should be printed at least once for each device that hits
>this condition.
>
>Alternatively you can turn this into a gprintk which would be good
>enough IMO.

Will do.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 7/7] microcode: reject late ucode loading if any core is parked

2019-09-30 Thread Chao Gao

On Fri, Sep 27, 2019 at 01:19:16PM +0200, Jan Beulich wrote:
>On 26.09.2019 15:53, Chao Gao wrote:
>> If a core with all of its thread being parked, late ucode loading
>> which currently only loads ucode on online threads would lead to
>> differing ucode revisions in the system. In general, keeping ucode
>> revision consistent would be less error-prone. To this end, if there
>> is a parked thread doesn't have an online sibling thread, late ucode
>> loading is rejected.
>> 
>> Two threads are on the same core or computing unit iff they have
>> the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
>> phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
>> is generated for each thread. And use a bitmap to reduce the
>> number of comparison.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> Alternatively, we can mask the thread id off apicid and use it
>> as the unique core id. It needs to introduce new field in cpuinfo_x86
>> to record the mask for thread id. So I don't take this way.
>
>It feels a little odd that you introduce a "custom" ID, but it
>should be fine without going this alternative route. (You
>wouldn't need a new field though, I think, as we've got the
>x86_num_siblings one already.)
>
>What I continue to be unconvinced of is for the chosen approach
>to be better than briefly unparking a thread on each core, as
>previously suggested.

It isn't so easy to go the same way as set_cx_pminfo().

1. NMI handler on parked threads is changed to a nop. To load ucode in
NMI handler, we have to switch back to normal NMI handler in
default_idle(). But it conflicts with what the comments in play_dead()
implies: it is not safe to call normal NMI handler after
cpu_exit_clear().

2. A precondition of unparking a thread on each core, we need to find
out exactly all parked cores and wake up one thread of each of them.
Then in theory, what this patch does is only part of unparking a thread
on each core.

I don't mean they are hard to address. But we need to take care of them.
Given that, IMO, this patch is much straightforward.

>
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -573,6 +573,64 @@ static int do_microcode_update(void *patch)
>>  return ret;
>>  }
>>  
>> +static unsigned int unique_core_id(unsigned int cpu, unsigned int 
>> socket_shift)
>> +{
>> +unsigned int core_id = cpu_to_cu(cpu);
>> +
>> +if ( core_id == INVALID_CUID )
>> +core_id = cpu_to_core(cpu);
>> +
>> +return (cpu_to_socket(cpu) << socket_shift) + core_id;
>> +}
>> +
>> +static int has_parked_core(void)
>> +{
>> +int ret = 0;
>
>I don't think you need the initializer here.
>
>> +if ( park_offline_cpus )
>
>if ( !park_offline_cpus )
>return 0;
>
>would allow one level less of indentation of the main part of
>the function body.
>
>> +{
>> +unsigned int cpu, max_bits, core_width;
>> +unsigned int max_sockets = 1, max_cores = 1;
>> +struct cpuinfo_x86 *c = cpu_data;
>> +unsigned long *bitmap;
>
+
>> +for_each_present_cpu(cpu)
>> +{
>> +if ( x86_cpu_to_apicid[cpu] == BAD_APICID )
>> +continue;
>> +
>> +/* Note that cpu_to_socket() get an ID starting from 0. */
>> +if ( cpu_to_socket(cpu) + 1 > max_sockets )
>
>Instead of "+ 1", why not >= ?
>
>> +max_sockets = cpu_to_socket(cpu) + 1;
>> +
>> +if ( c[cpu].x86_max_cores > max_cores )
>> +max_cores = c[cpu].x86_max_cores;
>
>What guarantees .x86_max_cores to be valid? Onlining a hot-added
>CPU is a two step process afaict, XENPF_cpu_hotadd followed by
>XENPF_cpu_online. In between the CPU would be marked present
>(and cpu_add() would also have filled x86_cpu_to_apicid[cpu]),
>but cpu_data[cpu] wouldn't have been filled yet afaict. This
>also makes the results of the cpu_to_*() unreliable that you use
>in unique_core_id().

Indeed. I agree.

>
>However, if we assume sufficient similarity between CPU
>packages (as you've done elsewhere in this series iirc), this

Yes.

>may not be an actual problem. But it wants mentioning in a code
>comment, I think. Plus at the very least you depend on the used
>cpu_data[] fields to not contain unduly large values (and hence
>you e.g. depend on cpu_data[] not gaining an initializer,
>setting the three fields of interest to their INVALID_* values,
>as currently done by identify_cpu()).

Can we skip those threads whose socket ID is invalid and initialize
the three fields in cpu_add()?
Or maintain a bitmap for parked threads to help distinguish them from
real offlined threads, and go through parked threads here?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

2019-09-29 Thread Chao Gao

Current, Xen isn't aware of device reset (initiated by dom0). Xen may
access the device while device cannot respond to config requests
normally (e.g.  after device reset, device may respond to config
requests with CRS completions to indicate it needs more time to
complete a reset, refer to pci_dev_wait() in linux kernel for more
detail). Here, don't assume msix capability is always visible and
return -EAGAIN to the caller.

Signed-off-by: Chao Gao 
---
I didn't find a way to trigger the assertion in normal usages.
It is found by an internal test: echo 1 to /sys/bus/pci//reset
when the device is being used by a guest. Although the test is a
little insane, it is better to avoid crashing Xen even for this case.
---
 xen/arch/x86/msi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 76d4034..e2f3c6c 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev *pdev, 
unsigned int reg,
 pos = entry ? entry->msi_attrib.pos
 : pci_find_cap_offset(seg, bus, slot, func,
   PCI_CAP_ID_MSIX);
-ASSERT(pos);
+if ( unlikely(!pos) )
+{
+printk_once(XENLOG_WARNING
+"%04x:%02x:%02x.%u MSI-X capability is missing\n",
+seg, bus, slot, func);
+return -EAGAIN;
+}
 
 if ( reg >= pos && reg < msix_pba_offset_reg(pos) + 4 )
 {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI while other threads are loading
ucode. If an ucode is to update something shared by a whole socket,
control thread may be accessing things that are being updating by the
ucode loading on other cores. It is not safe. Update ucode on the
control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli 
Signed-off-by: Chao Gao 
---
Note:
I plan to finish remaining patches (like handling parked CPU,
BDF90 and WBINVD, IMO, not important as this one) in RCs.
So this v12 only has one patch.

Changes in v12:
 - take care that self NMI may not arrive synchronously.
 - explain why control thread loads ucode first in patch description.
 - use parse_boolean to parse "scan" field in "ucode" option. The change
 is compatible with the old style.
 - staticify loading_err
 - drop primary_nmi_work()

Changes in v11:
 - Extend existing 'nmi' option rather than use a new one.
 - use per-cpu variable to store error code of xxx_nmi_work()
 - rename secondary_thread_work to secondary_nmi_work.
 - intialize nmi_patch to ZERO_BLOCK_PTR and make it static.
 - constify nmi_cpu
 - explain why control thread loads ucode first in patch description

Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc |   6 +-
 xen/arch/x86/microcode.c  | 174 +++---
 xen/arch/x86/traps.c  |   6 +-
 xen/include/asm-x86/nmi.h |   3 +
 4 files changed, 156 insertions(+), 33 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index fc64429..f5410b3 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2053,7 +2053,7 @@ pages) must also be specified via the tbuf_size parameter.
 > `= unstable | skewed | stable:socket`
 
 ### ucode (x86)
-> `= [ | scan]`
+> `= List of [  | scan=, nmi= ]`
 
 Specify how and where to find CPU microcode update blob.
 
@@ -2074,6 +2074,10 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+'nmi' determines late loading is performed in NMI handler or just in
+stop_machine context. In NMI handler, even NMIs are blocked, which is
+considered safer. The default value is `true`.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b882ac8..3c0f72e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,6 +97,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* By default, ucode loading is done in NMI handler */
+static bool ucode_in_nmi = true;
+
 /* Protected by microcode_mutex */
 static struct microcode_patch *microcode_cache;
 
@@ -105,23 +110,40 @@ void __init microcode_set_module(unsigned int idx)
 }
 
 /*
- * The format is '[|scan]'. Both options are optional.
- * If the EFI has forced which of the multiboot payloads is to be used,
- * no parsing will be attempted.
+ * The format is '[|scan=, nmi=]'. Both options are
+ * optional. If the EFI has forced which of the multiboot payloads is to be
+ * used, only nmi= is parsed.
  */
 static int __init parse_ucode(const char *s)
 {
-const char *q = NULL;
+const char *ss;
+int val, rc = 0;
 
-if ( ucode_mod_forced ) /* Forced by EFI */
-   return 0;
+do {
+ss = strchr(s, ',');
+

Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

On Fri, Sep 27, 2019 at 03:55:00PM +0200, Jan Beulich wrote:
>On 27.09.2019 15:53, Chao Gao wrote:
>> On Fri, Sep 27, 2019 at 12:19:22PM +0200, Jan Beulich wrote:
>>> On 26.09.2019 15:53, Chao Gao wrote:
>>>> @@ -420,14 +498,23 @@ static int control_thread_fn(const struct 
>>>> microcode_patch *patch)
>>>>  return ret;
>>>>  }
>>>>  
>>>> -/* Let primary threads load the given ucode update */
>>>> -set_state(LOADING_ENTER);
>>>> -
>>>> +/* Control thread loads ucode first while others are in NMI handler. 
>>>> */
>>>>  ret = microcode_ops->apply_microcode(patch);
>>>>  if ( !ret )
>>>>  atomic_inc(_updated);
>>>>  atomic_inc(_out);
>>>>  
>>>> +if ( ret == -EIO )
>>>> +{
>>>> +printk(XENLOG_ERR
>>>> +   "Late loading aborted: CPU%u failed to update ucode\n", 
>>>> cpu);
>>>> +set_state(LOADING_EXIT);
>>>> +return ret;
>>>> +}
>>>> +
>>>> +/* Let primary threads load the given ucode update */
>>>> +set_state(LOADING_ENTER);
>>>
>>> While the description goes to some lengths to explain this ordering of
>>> updates, I still don't really see the point: How is it better for the
>>> control CPU to have updated its ucode early and then hit an NMI before
>>> the other CPUs have even started updating, than the other way around
>>> in the opposite case?
>> 
>> We want to be conservative here. If an ucode is to update something
>> shared by a whole socket, for the latter case, control thread may
>> be accessing things that are being updating by the ucode loading on
>> other cores. It is not safe, just like sibling thread isn't expected
>> to access features exposed by the old ucode when primary thread is
>> loading ucode.
>
>Ah yes, considering a socket-wide effect didn't occur to me (although
>it should have). So if you mention this aspect in the description, I
>think I'm going to be fine with the change in this regard. Yet (as so
>often) this raises another question: What about "secondary" sockets?
>Shouldn't we entertain a similar two-step approach there then?

No. The two-step approach is because control thread cannot call
self_nmi() in case of triggering unknown_nmi_error() and what is done
in the main NMI handler isn't well controlled. All cores on other
sockets will rendezvous in NMI handler. It means every core's behavior
on other sockets is well controlled.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-27 Thread Chao Gao

On Fri, Sep 27, 2019 at 12:19:22PM +0200, Jan Beulich wrote:
>On 26.09.2019 15:53, Chao Gao wrote:
>> @@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
>>  }
>>  
>>  /*
>> - * The format is '[|scan]'. Both options are optional.
>> + * The format is '[|scan, nmi=]'. Both options are optional.
>>   * If the EFI has forced which of the multiboot payloads is to be used,
>> - * no parsing will be attempted.
>> + * only nmi= is parsed.
>>   */
>>  static int __init parse_ucode(const char *s)
>>  {
>> -const char *q = NULL;
>> +const char *ss;
>> +int val, rc = 0;
>>  
>> -if ( ucode_mod_forced ) /* Forced by EFI */
>> -   return 0;
>> +do {
>> +ss = strchr(s, ',');
>> +if ( !ss )
>> +ss = strchr(s, '\0');
>>  
>> -if ( !strncmp(s, "scan", 4) )
>> -ucode_scan = 1;
>> -else
>> -ucode_mod_idx = simple_strtol(s, , 0);
>> +if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
>> +ucode_in_nmi = val;
>> +else if ( !ucode_mod_forced ) /* Not forced by EFI */
>> +{
>> +const char *q = NULL;
>> +
>> +if ( !strncmp(s, "scan", 4) )
>> +{
>> +ucode_scan = true;
>
>I guess it would have resulted in more consistent code if you had
>used parse_boolean() here, too.
>
>> @@ -222,6 +246,8 @@ const struct microcode_ops *microcode_ops;
>>  static DEFINE_SPINLOCK(microcode_mutex);
>>  
>>  DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>> +/* Store error code of the work done in NMI handler */
>> +DEFINE_PER_CPU(int, loading_err);
>
>static
>
>> @@ -356,42 +383,88 @@ static void set_state(unsigned int state)
>>  smp_wmb();
>>  }
>>  
>> -static int secondary_thread_fn(void)
>> +static int secondary_nmi_work(void)
>>  {
>> -unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +cpumask_set_cpu(smp_processor_id(), _callin_map);
>>  
>> -if ( !wait_for_state(LOADING_CALLIN) )
>> -return -EBUSY;
>> +return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
>> +}
>> +
>> +static int primary_thread_work(const struct microcode_patch *patch)
>> +{
>> +int ret;
>>  
>>  cpumask_set_cpu(smp_processor_id(), _callin_map);
>>  
>> -if ( !wait_for_state(LOADING_EXIT) )
>> +if ( !wait_for_state(LOADING_ENTER) )
>>  return -EBUSY;
>>  
>> -/* Copy update revision from the primary thread. */
>> -this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
>> +ret = microcode_ops->apply_microcode(patch);
>> +if ( !ret )
>> +atomic_inc(_updated);
>> +atomic_inc(_out);
>>  
>> -return 0;
>> +return ret;
>>  }
>>  
>> -static int primary_thread_fn(const struct microcode_patch *patch)
>> +static int primary_nmi_work(const struct microcode_patch *patch)
>> +{
>> +return primary_thread_work(patch);
>> +}
>
>Why this wrapper? The function signatures are identical. I guess
>you want to emphasize the environment the function is to be used
>in, so perhaps fine despite the redundancy. At least there's no
>address taken of this function, so the compiler can eliminate it.
>
>> +static int secondary_thread_fn(void)
>> +{
>>  if ( !wait_for_state(LOADING_CALLIN) )
>>  return -EBUSY;
>>  
>> -cpumask_set_cpu(smp_processor_id(), _callin_map);
>> +self_nmi();
>>  
>> -if ( !wait_for_state(LOADING_ENTER) )
>> +/* Copy update revision from the primary thread. */
>> +this_cpu(cpu_sig).rev =
>> +per_cpu(cpu_sig, cpumask_first(this_cpu(cpu_sibling_mask))).rev;
>
>_alternative_instructions() takes specific care to avoid relying on
>the NMI potentially not arriving synchronously (in which case you'd
>potentially copy a not-yet-updated CPU signature above). I think the
>same care wants applying here, which I guess would be another
>
>wait_for_state(LOADING_EXIT);
>
>> +return this_cpu(loading_err);
>> +}
>> +
>> +static int primary_thread_fn(const struct microcode_patch *patch)
>> +{
>> +if ( !wait_for_state(LOADING_CALLIN) )
>>  return -EBUSY;
>>  
>> -ret = microcode_ops->apply_microcode(patch);
>> -if ( !ret )
>> -atomic_inc(_updated);
>> -atomic_inc(_out);
>> +if ( ucode_in_nmi )

[Xen-devel] [PATCH v11 5/7] microcode: remove microcode_update_lock

2019-09-26 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 9a8f179..1e52f7f 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -232,7 +229,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -247,15 +243,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index c083e17..9ededcc 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -287,7 +284,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -302,8 +298,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
@@ -316,7 +311,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 7/7] microcode: reject late ucode loading if any core is parked

2019-09-26 Thread Chao Gao

If a core with all of its thread being parked, late ucode loading
which currently only loads ucode on online threads would lead to
differing ucode revisions in the system. In general, keeping ucode
revision consistent would be less error-prone. To this end, if there
is a parked thread doesn't have an online sibling thread, late ucode
loading is rejected.

Two threads are on the same core or computing unit iff they have
the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
is generated for each thread. And use a bitmap to reduce the
number of comparison.

Signed-off-by: Chao Gao 
---
Alternatively, we can mask the thread id off apicid and use it
as the unique core id. It needs to introduce new field in cpuinfo_x86
to record the mask for thread id. So I don't take this way.
---
 xen/arch/x86/microcode.c| 75 +
 xen/include/asm-x86/processor.h |  1 +
 2 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b9fa8bb..b70eb16 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -573,6 +573,64 @@ static int do_microcode_update(void *patch)
 return ret;
 }
 
+static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
+{
+unsigned int core_id = cpu_to_cu(cpu);
+
+if ( core_id == INVALID_CUID )
+core_id = cpu_to_core(cpu);
+
+return (cpu_to_socket(cpu) << socket_shift) + core_id;
+}
+
+static int has_parked_core(void)
+{
+int ret = 0;
+
+if ( park_offline_cpus )
+{
+unsigned int cpu, max_bits, core_width;
+unsigned int max_sockets = 1, max_cores = 1;
+struct cpuinfo_x86 *c = cpu_data;
+unsigned long *bitmap;
+
+for_each_present_cpu(cpu)
+{
+if ( x86_cpu_to_apicid[cpu] == BAD_APICID )
+continue;
+
+/* Note that cpu_to_socket() get an ID starting from 0. */
+if ( cpu_to_socket(cpu) + 1 > max_sockets )
+max_sockets = cpu_to_socket(cpu) + 1;
+
+if ( c[cpu].x86_max_cores > max_cores )
+max_cores = c[cpu].x86_max_cores;
+}
+
+core_width = fls(max_cores);
+max_bits = max_sockets << core_width;
+bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(max_bits));
+if ( !bitmap )
+return -ENOMEM;
+
+for_each_present_cpu(cpu)
+{
+if ( cpu_online(cpu) || x86_cpu_to_apicid[cpu] == BAD_APICID )
+continue;
+
+__set_bit(unique_core_id(cpu, core_width), bitmap);
+}
+
+for_each_online_cpu(cpu)
+__clear_bit(unique_core_id(cpu, core_width), bitmap);
+
+ret = (find_first_bit(bitmap, max_bits) < max_bits);
+xfree(bitmap);
+}
+
+return ret;
+}
+
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
 int ret;
@@ -611,6 +669,23 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
  */
 ASSERT(cpumask_first(_online_map) == nmi_cpu);
 
+/*
+ * If there is a core with all of its threads parked, late loading may
+ * cause differing ucode revisions in the system. Refuse this operation.
+ */
+ret = has_parked_core();
+if ( ret )
+{
+if ( ret > 0 )
+{
+printk(XENLOG_WARNING
+   "Ucode loading aborted: found a parked core\n");
+ret = -EPERM;
+}
+xfree(buffer);
+goto put;
+}
+
 patch = parse_blob(buffer, len);
 xfree(buffer);
 if ( IS_ERR(patch) )
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index c92956f..753deec 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -171,6 +171,7 @@ extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 
*c);
 
 #define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
 #define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+#define cpu_to_cu(_cpu) (cpu_data[_cpu].compute_unit_id)
 
 unsigned int apicid_to_socket(unsigned int);
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-26 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI and interacting with the old
ucode not in a controlled way while other threads are loading ucode.
Update ucode on the control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli 
Signed-off-by: Chao Gao 
---
Changes in v11:
 - Extend existing 'nmi' option rather than use a new one.
 - use per-cpu variable to store error code of xxx_nmi_work()
 - rename secondary_thread_work to secondary_nmi_work.
 - intialize nmi_patch to ZERO_BLOCK_PTR and make it static.
 - constify nmi_cpu
 - explain why control thread loads ucode first in patch description

Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc |   6 +-
 xen/arch/x86/microcode.c  | 156 ++
 xen/arch/x86/traps.c  |   6 +-
 xen/include/asm-x86/nmi.h |   3 +
 4 files changed, 138 insertions(+), 33 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 832797e..8beb285 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2036,7 +2036,7 @@ pages) must also be specified via the tbuf_size parameter.
 > `= unstable | skewed | stable:socket`
 
 ### ucode (x86)
-> `= [ | scan]`
+> `= List of [  | scan, nmi= ]`
 
 Specify how and where to find CPU microcode update blob.
 
@@ -2057,6 +2057,10 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+'nmi' determines late loading is performed in NMI handler or just in
+stop_machine context. In NMI handler, even NMIs are blocked, which is
+considered safer. The default value is `true`.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 6c23879..b9fa8bb 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,6 +97,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* By default, ucode loading is done in NMI handler */
+static bool ucode_in_nmi = true;
+
 /* Protected by microcode_mutex */
 static struct microcode_patch *microcode_cache;
 
@@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
 }
 
 /*
- * The format is '[|scan]'. Both options are optional.
+ * The format is '[|scan, nmi=]'. Both options are optional.
  * If the EFI has forced which of the multiboot payloads is to be used,
- * no parsing will be attempted.
+ * only nmi= is parsed.
  */
 static int __init parse_ucode(const char *s)
 {
-const char *q = NULL;
+const char *ss;
+int val, rc = 0;
 
-if ( ucode_mod_forced ) /* Forced by EFI */
-   return 0;
+do {
+ss = strchr(s, ',');
+if ( !ss )
+ss = strchr(s, '\0');
 
-if ( !strncmp(s, "scan", 4) )
-ucode_scan = 1;
-else
-ucode_mod_idx = simple_strtol(s, , 0);
+if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
+ucode_in_nmi = val;
+else if ( !ucode_mod_forced ) /* Not forced by EFI */
+{
+const char *q = NULL;
+
+if ( !strncmp(s, "scan", 4) )
+{
+ucode_scan = true;
+q = s + 4;
+}
+else
+ucode_mod_idx = simple_strtol(s, , 0);
+
+if ( q != ss )
+rc = -EINVAL;
+

[Xen-devel] [PATCH v11 2/7] microcode: unify ucode loading during system bootup and resuming

2019-09-26 Thread Chao Gao

During system bootup and resuming, CPUs just load the cached ucode.
So one unified function microcode_update_one() is introduced. It
takes a boolean to indicate whether ->start_update should be called.
Since early_microcode_update_cpu() is only called on BSP (APs call
the unified function), start_update is always true and so remove
this parameter.

There is a functional change: ->start_update is called on BSP and
->end_update_percpu is called during system resuming. They are not
invoked by previous microcode_resume_cpu().

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v10:
 - call ->start_update for system resume from suspension

Changes in v9:
 - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
   microcode_update_one()
 - rebase and fix conflicts.

Changes in v8:
 - split out from the previous patch
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 91 +++--
 xen/arch/x86/smpboot.c  |  5 +--
 xen/include/asm-x86/processor.h |  4 +-
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 269b140..01e6aec 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+microcode_update_one(true);
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 3ea2a6e..9c0e5c4 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char *buf, 
size_t len)
 return NULL;
 }
 
-int microcode_resume_cpu(void)
-{
-int err;
-struct cpu_signature *sig = _cpu(cpu_sig);
-
-if ( !microcode_ops )
-return 0;
-
-spin_lock(_mutex);
-
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode(microcode_cache);
-spin_unlock(_mutex);
-
-return err;
-}
-
 void microcode_free_patch(struct microcode_patch *microcode_patch)
 {
 microcode_ops->free_patch(microcode_patch->mc);
@@ -391,11 +373,38 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-int __init early_microcode_update_cpu(bool start_update)
+/* Load a cached update to current cpu */
+int microcode_update_one(bool start_update)
+{
+int err;
+
+if ( !microcode_ops )
+return -EOPNOTSUPP;
+
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
+
+if ( start_update && microcode_ops->start_update )
+{
+err = microcode_ops->start_update();
+if ( err )
+return err;
+}
+
+err = microcode_update_cpu(NULL);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return err;
+}
+
+/* BSP calls this function to parse ucode blob and then apply an update. */
+int __init early_microcode_update_cpu(void)
 {
 int rc = 0;
 void *data = NULL;
 size_t len;
+struct microcode_patch *patch;
 
 if ( !microcode_ops )
 return -ENOSYS;
@@ -411,44 +420,26 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(_cpu(cpu_sig));
-
 if ( !data )
 return -ENOMEM;
 
-if ( start_update )
+patch = parse_blob(data, len);
+if ( IS_ERR(patch) )
 {
-struct microcode_patch *patch;
-
-patch = parse_blob(data, len);
-if ( IS_ERR(patch) )
-{
-printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
-   PTR_ERR(patch));
-return PTR_ERR(patch);
-}
-
-if ( !patch )
-return -ENOENT;
-
-spin_lock(_mutex);
-rc = microcode_update_cache(patch);
-spin_unlock(_mutex);
-ASSERT(rc);
-
-if ( microcode_ops->start_update )
-rc = microcode_ops->start_update();
-
-if ( rc )
-return rc;
+printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
+   PTR_ERR(patch));
+return PTR_ERR(patch);
 }
 
-rc = microcode_update_cpu(NULL);
+if ( !patch )
+return -ENOENT;
 
-if ( microcode_ops->end_update_percpu )
-microcode_ops->end_update_percpu();
+spin_lock(_mutex);
+rc = microcode_update_cache(patch);
+spin_unlock(_mutex);
+ASSERT(rc);
 
-return rc;
+return microcode_update_one(true);
 }
 
 int __init early_microcode_init(void)
@@ -468,7 +459,7 @@ int __init early_microcode_init(void)
 microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
-rc = early_microcode_update_cpu(true);

[Xen-devel] [PATCH v11 4/7] x86/microcode: Synchronize late microcode loading

2019-09-26 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v11:
 - Use the sample code of wait_for_state() provided by Jan
 - make wait_cpu_call{in,out} take unsigned int to avoid type casting
 - do assignment in while clause in control_thread_fn() to eliminate
 duplication.

Changes in v10:
 - introduce wait_for_state() and set_state() helper functions
 - make wait_for_condition() return bool and take const void *
 - disable/enable watchdog in control thread
 - rename "master" and "slave" thread to "primary" and "secondary"

Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
 ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
 there are two places for a cpu to call in. bitmap won't be counted
 twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
 update. We have to do this because some threads may be stuck in NMI
 handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 297 ++-
 1 file changed, 267 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 9c0e5c4..6c23879 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+LOADING_PREPARE,
+LOADING_CALLIN,
+LOADING_ENTER,
+LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the

[Xen-devel] [PATCH v11 0/7] improve late microcode loading

2019-09-26 Thread Chao Gao

Changes in v11:
 - reject late ucode loading if any core is parked
 - correct the usage of microcode_mutex in microcode_update_cpu()
 - extend 'ucode' boot option to enable/disable ucode loading in NMI
 - drop the last two patches of v10 (about BDF90 and wbinvd, I haven't
 get an answer to opens yet).
 - other minor changes are described in each patch's change log

Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 4 more details).

This series includes below changes:
 1. Patch 1-3: cleanup and preparation for synchronizing ucode loading
 2. Patch 4: synchronize late microcode loading
 3. Patch 5: support parallel microcodes update on different cores
 4. Patch 6: rendezvous CPUs in NMI handler and load ucode
 5. Patch 7: reject late ucode loading if any core is parked

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Previous change log:
Major changes in version 10:
 - add back the patch to call wbinvd() conditionally
 - add a patch to disable late loading due to BDF90
 - rendezvous CPUs in NMI handler and load ucode. But provide an option
 to disable this behavior.
 - avoid the call of self_nmi() on the control thread because it may
 trigger the unknown_nmi_error() in do_nmi().
 - ensure ->start_update is called during system resuming from
 suspension

Changes in version 9:
 - add Jan's Reviewed-by
 - rendevzous threads in NMI handler to disable NMI. Note that NMI can
 be served as usual on threads that are chosen to initiate ucode loading
 on each core.
 - avoid unnecessary memory allocation or copy when creating a microcode
 patch (patch 12)
 - rework patch 1 to avoid microcode_update_match() being used to
 compare two arbitrary updates.
 - call .end_update in early loading path.

Changes in version 8:
 - block #NMI handling during microcode loading (Patch 16)
 - Don't assume that all CPUs in the system have loaded a same ucode.
 So when parsing a blob, we attempt to save a patch as long as it matches
 with current cpu signature regardless of the revision of the patch.
 And also for loading, we only require the patch to be loaded isn't old
 than the cached one.
 - store an update after the first successful loading on a CPU
 - remove the patch that calls wbinvd() unconditionally before microcode
 loading. It is under internal discussion.
 - divide two big patches into several patches to improve readability.

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.

Chao Gao (7):
  microcode: split out apply_microcode() from cpu_request_microcode()
  microcode: unify ucode loading during system bootup and resuming
  microcode: reduce memory allocation and copy when creating a patch
  x86/microcode: Synchronize late microcode loading
  microcode: remove microcode_update_lock
  microcode: rendezvous C

[Xen-devel] [PATCH v11 3/7] microcode: reduce memory allocation and copy when creating a patch

2019-09-26 Thread Chao Gao

To create a microcode patch from a vendor-specific update,
allocate_microcode_patch() copied everything from the update.
It is not efficient. Essentially, we just need to go through
ucodes in the blob, find the one with the newest revision and
install it into the microcode_patch. In the process, buffers
like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
side) can be reused. microcode_patch now is allocated after
it is sure that there is a matching ucode.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes in v11:
 - correct parameter type issues of get_next_ucode_from_buffer

Changes in v10:
 - avoid unnecessary type casting
   * introduce compare_header on AMD side
   * specify the type of the first parameter of get_next_ucode_from_buffer()
 on Intel side

Changes in v9:
 - new
---
 xen/arch/x86/microcode_amd.c   | 112 +
 xen/arch/x86/microcode_intel.c |  67 +---
 2 files changed, 69 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 0199308..9a8f179 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -194,36 +194,6 @@ static bool match_cpu(const struct microcode_patch *patch)
 return patch && (microcode_fits(patch->mc_amd) == NEW_UCODE);
 }
 
-static struct microcode_patch *alloc_microcode_patch(
-const struct microcode_amd *mc_amd)
-{
-struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
-struct microcode_amd *cache = xmalloc(struct microcode_amd);
-void *mpb = xmalloc_bytes(mc_amd->mpb_size);
-struct equiv_cpu_entry *equiv_cpu_table =
-xmalloc_bytes(mc_amd->equiv_cpu_table_size);
-
-if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
-{
-xfree(microcode_patch);
-xfree(cache);
-xfree(mpb);
-xfree(equiv_cpu_table);
-return ERR_PTR(-ENOMEM);
-}
-
-memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
-cache->mpb = mpb;
-cache->mpb_size = mc_amd->mpb_size;
-memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
-   mc_amd->equiv_cpu_table_size);
-cache->equiv_cpu_table = equiv_cpu_table;
-cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
-microcode_patch->mc_amd = cache;
-
-return microcode_patch;
-}
-
 static void free_patch(void *mc)
 {
 struct microcode_amd *mc_amd = mc;
@@ -236,6 +206,17 @@ static void free_patch(void *mc)
 }
 }
 
+static enum microcode_match_result compare_header(
+const struct microcode_header_amd *new_header,
+const struct microcode_header_amd *old_header)
+{
+if ( new_header->processor_rev_id == old_header->processor_rev_id )
+return (new_header->patch_id > old_header->patch_id) ? NEW_UCODE
+ : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 static enum microcode_match_result compare_patch(
 const struct microcode_patch *new, const struct microcode_patch *old)
 {
@@ -246,11 +227,7 @@ static enum microcode_match_result compare_patch(
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 
-if ( new_header->processor_rev_id == old_header->processor_rev_id )
-return (new_header->patch_id > old_header->patch_id) ?
-NEW_UCODE : OLD_UCODE;
-
-return MIS_UCODE;
+return compare_header(new_header, old_header);
 }
 
 static int apply_microcode(const struct microcode_patch *patch)
@@ -328,18 +305,10 @@ static int get_ucode_from_buffer_amd(
 return -EINVAL;
 }
 
-if ( mc_amd->mpb_size < mpbuf->len )
-{
-if ( mc_amd->mpb )
-{
-xfree(mc_amd->mpb);
-mc_amd->mpb_size = 0;
-}
-mc_amd->mpb = xmalloc_bytes(mpbuf->len);
-if ( mc_amd->mpb == NULL )
-return -ENOMEM;
-mc_amd->mpb_size = mpbuf->len;
-}
+mc_amd->mpb = xmalloc_bytes(mpbuf->len);
+if ( !mc_amd->mpb )
+return -ENOMEM;
+mc_amd->mpb_size = mpbuf->len;
 memcpy(mc_amd->mpb, mpbuf->data, mpbuf->len);
 
 pr_debug("microcode: CPU%d size %zu, block size %u offset %zu equivID %#x 
rev %#x\n",
@@ -459,8 +428,9 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
  size_t bufsize)
 {
 struct microcode_amd *mc_amd;
+struct microcode_header_amd *saved = NULL;
 struct microcode_patch *patch = NULL;
-size_t offset = 0;
+size_t offset = 0, saved_size = 0;
 int error = 0;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
@@ -550,29 +520,22 @@ static struct microcode_patch 
*cpu_request_microcode(const void *bu

[Xen-devel] [PATCH v11 1/7] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-26 Thread Chao Gao

During late microcode loading, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). In general, for both
early loading on BSP and late loading, cpu_request_microcode() is
called first to get the matching microcode update contained by
the blob and then apply_microcode() is invoked explicitly on each
cpu in common code.

Given that all CPUs are supposed to have the same signature, parsing
microcode only needs to be done once. So cpu_request_microcode() is
also moved out of microcode_update_cpu().

In some cases (e.g. a broken bios), the system may have multiple
revisions of microcode update. So we would try to load a microcode
update as long as it covers current cpu. And if a cpu loads this patch
successfully, the patch would be stored into the patch cache.

Note that calling ->apply_microcode() itself doesn't require any
lock being held. But the parameter passed to it may be protected
by some locks. E.g. microcode_update_cpu() acquires microcode_mutex
to avoid microcode_cache being updated by others.

Signed-off-by: Chao Gao 
---
Changes in v11:
 - drop Roger's RB.
 - acquire microcode_mutex before checking whether microcode_cache is
 NULL
 - ignore -EINVAL which indicates a equal/newer ucode is already loaded.
 - free 'buffer' earlier to avoid goto clauses in microcode_update()

Changes in v10:
 - make microcode_update_cache static
 - raise an error if loading ucode failed with -EIO
 - ensure end_update_percpu() is called following a successful call of
 start_update()

Changes in v9:
 - remove the calling of ->compare_patch in microcode_update_cpu().
 - drop "microcode_" prefix for static function - microcode_parse_blob().
 - rebase and fix conflict

Changes in v8:
 - divide the original patch into three patches to improve readability
 - load an update on each cpu as long as the update covers current cpu
 - store an update after the first successful loading on a CPU
 - Make sure the current CPU (especially pf value) is covered
 by updates.

changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
---
 xen/arch/x86/microcode.c| 173 +++-
 xen/arch/x86/microcode_amd.c|  38 -
 xen/arch/x86/microcode_intel.c  |  66 +++
 xen/include/asm-x86/microcode.h |   5 +-
 4 files changed, 172 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b44e4d7..3ea2a6e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,12 +189,19 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return a patch that covers current CPU. If there are multiple patches,
+ * return the one with the highest revision number. Return error If no
+ * patch is found and an error occurs during the parsing process. Otherwise
+ * return NULL.
+ */
+static struct microcode_patch *parse_blob(const char *buf, size_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
+
+return NULL;
+}
 
 int microcode_resume_cpu(void)
 {
@@ -220,15 +227,8 @@ void microcode_free_patch(struct microcode_patch 
*microcode_patch)
 xfree(microcode_patch);
 }
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(_mutex));
-
-return microcode_cache;
-}
-
 /* Return true if cache gets updated. Otherwise, return false */
-bool microcode_update_cache(struct microcode_patch *patch)
+static bool microcode_update_cache(struct microcode_patch *patch)
 {
 ASSERT(spin_is_locked(_mutex));
 
@@ -249,49 +249,82 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static int microcode_update_cpu(const void *buf, size_t size)
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(const struct microcode_patch *patch)
 {
-int err;
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
+int err = microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
-spin_lock(_mutex);
+if ( unlikely(err) )
+return err;
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(buf, size);
+spin_lock(_mutex);
+if ( patch )
+err = microcode_ops->apply_microcode(patch);
+else if ( microcode_cache )
+

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated

2019-09-23 Thread Chao Gao

On Wed, Sep 18, 2019 at 02:16:13PM -0700, Joe Jin wrote:
>On 9/16/19 11:48 PM, Jan Beulich wrote:
>> On 17.09.2019 00:20, Joe Jin wrote:
>>> On 9/16/19 1:01 AM, Jan Beulich wrote:
 On 13.09.2019 18:38, Joe Jin wrote:
> On 9/13/19 12:14 AM, Jan Beulich wrote:
>> On 12.09.2019 20:03, Joe Jin wrote:
>>> --- a/xen/drivers/passthrough/io.c
>>> +++ b/xen/drivers/passthrough/io.c
>>> @@ -412,6 +412,9 @@ int pt_irq_create_bind(
>>>  pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
>>>  pirq_dpci->gmsi.gflags = gflags;
>>>  }
>>> +
>>> +if ( hvm_funcs.sync_pir_to_irr )
>>> +
>>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>>
>> If the need for this change can be properly explained, then it
>> still wants converting to alternative_vcall() - the the other
>> caller of this hook. Or perhaps even better move vlapic.c's
>> wrapper (suitably renamed) into hvm.h, and use it here.
>
> Yes I agree, I'm not 100% sure, so I set it to RFC.

 And btw, please also attach a brief comment here, to clarify
 why the syncing is needed precisely at this point.

>> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id
>> (right after your code insertion) allows for the field to be
>> invalid, which I think you need to guard against.
>
> I think you means multiple destination, then it's -1?

 The reason for why it might be -1 are irrelevant here, I think.
 You need to handle the case both to avoid an out-of-bounds
 array access and to make sure an IRR bit wouldn't still get
 propagated too late in some special case.
>>>
>>> Add following checks?
>>> if ( dest_vcpu_id >= 0 && dest_vcpu_id < d->max_vcpus &&
>>>  d->vcpu[dest_vcpu_id]->runstate.state <= RUNSTATE_blocked )
>> 
>> Just the >= part should suffice; without an explanation I don't
>> see why you want the runstate check (which after all is racy
>> anyway afaict).
>> 
 Also - what about the respective other path in the function,
 dealing with PT_IRQ_TYPE_PCI and PT_IRQ_TYPE_MSI_TRANSLATE? It
 seems to me that there's the same chance of deferring IRR
 propagation for too long?
>>>
>>> This is possible, can you please help on how to get which vcpu associate 
>>> the IRQ?
>>> I did not found any helper on current Xen.
>> 
>> There's no such helper, I'm afraid. Looking at hvm_migrate_pirq()
>> and hvm_girq_dest_2_vcpu_id() I notice that the former does nothing
>> if pirq_dpci->gmsi.posted is set. Hence pirq_dpci->gmsi.dest_vcpu_id
>> isn't really used in this case (please double check), and so you may
>> want to update the field alongside setting pirq_dpci->gmsi.posted in
>> pt_irq_create_bind(), covering the multi destination case.
>> 
>> Your code addition still visible in context above may then want to
>> be further conditionalized upon iommu_intpost or (perhaps better)
>> pirq_dpci->gmsi.posted being set.
>> 
>
>Sorry this is new to me, and I have to study from code.
>Do you think below check cover all conditions?
>
>diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>index 4290c7c710..90c3da441d 100644
>--- a/xen/drivers/passthrough/io.c
>+++ b/xen/drivers/passthrough/io.c
>@@ -412,6 +412,10 @@ int pt_irq_create_bind(
> pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
> pirq_dpci->gmsi.gflags = gflags;
> }
>+
>+/* Notify guest of pending interrupts if necessary */
>+if ( dest_vcpu_id >= 0 && iommu_intpost && pirq_dpci->gmsi.posted 
>)

Hi Joe,

Do you enable vt-d posted interrupt in Xen boot options? I don't see
why it is specific to vt-d posted interrupt. If only CPU side posted
interrupt is enabled, it is also possible that interrupts are not
propagated from PIR to IRR in time.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 15/16] microcode: disable late loading if CPUs are affected by BDF90

2019-09-17 Thread Chao Gao

On Fri, Sep 13, 2019 at 11:22:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -283,6 +284,27 @@ static enum microcode_match_result compare_patch(
>>   : OLD_UCODE;
>>  }
>>  
>> +static bool is_blacklisted(void)
>> +{
>> +struct cpuinfo_x86 *c = _cpu_data;
>> +uint64_t llc_size = c->x86_cache_size * 1024ULL;
>> +struct cpu_signature *sig = _cpu(cpu_sig);
>> +
>> +do_div(llc_size, c->x86_max_cores);
>> +
>> +/*
>> + * Late loading on model 79 with microcode revision less than 0x0b21
>> + * and LLC size per core bigger than 2.5MB may result in a system hang.
>> + * This behavior is documented in item BDF90, #334165 (Intel Xeon
>> + * Processor E7-8800/4800 v4 Product Family).
>> + */
>> +if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
>> + llc_size > 2621440 && sig->rev < 0x0b21 )
>> +return true;
>> +
>> +return false;
>> +}
>
>Isn't this misbehavior worked around by the wbinvd() you add in the next
>patch?

Hi Jan and Andrew,

Perhaps I misunderstood what I was told. I am confirming with Ashok
whether this patch is necessary.

>
>> --- a/xen/include/asm-x86/microcode.h
>> +++ b/xen/include/asm-x86/microcode.h
>> @@ -30,6 +30,7 @@ struct microcode_ops {
>>  bool (*match_cpu)(const struct microcode_patch *patch);
>>  enum microcode_match_result (*compare_patch)(
>>  const struct microcode_patch *new, const struct microcode_patch 
>> *old);
>> +bool (*is_blacklisted)(void);
>
>Why a hook rather than a boolean flag, which could be set by
>microcode_update_one() (as invoked during AP bringup)?

How about set the boolean flag in Intel_errata_workarounds?

One limitation of setting the flag in microcode_update_one() is:
BSP also calls microcode_update_one(). But calculating LLC size per
core on BSP would meet the same issue as the following patch
(i.e. patch 16/16): BSP's current_cpu_data isn't initialized
properly. We might need to revert commit f97838bbd980a01 in
some way and reenumerate features after ucode loading is done.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 00/16] improve late microcode loading

2019-09-17 Thread Chao Gao

On Fri, Sep 13, 2019 at 10:47:36AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> This series includes below changes:
>>  1. Patch 1-11: introduce a global microcode cache and some cleanup
>>  2. Patch 12: synchronize late microcode loading
>>  3. Patch 13: support parallel microcodes update on different cores
>>  4. Patch 14: block #NMI handling during microcode loading
>>  5. Patch 15: disable late ucode loading due to BDF90
>>  6. Patch 16: call wbinvd() conditionally
>
>I don't know why it didn't occur to me earlier, but what about
>parked / offlined CPUs? They'll have their ucode updated when they
>get brought back online, but until then their ucode will disagree
>with that of the online CPUs. For truly offline CPUs this may be
>fine, but parked ones should probably be updated, perhaps via the
>same approach as used when C-state data becomes available (see
>set_cx_pminfo())?

Yes. It provides a means to wake up the parked CPU and a chance to run
some code (like loading ucode). But parked CPUs are cleared from
sibling info and cpu_online_map (see __cpu_disable()). If parallel
ucode loading is expected on parked CPUs, we should be able to
determine the primary threads and the number of cores no matter it is
online or parked. To this end, a new sibling map should be maintained
for each CPU and this map isn't changed when a CPU gets parked.

In Linux kernel, the approach is quite simple: late loading is
prohibited if any CPU is parked; admin should online all parked CPU
before loading ucode.

Do you have any preference?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 14/16] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-15 Thread Chao Gao

On Fri, Sep 13, 2019 at 11:14:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> When one core is loading ucode, handling NMI on sibling threads or
>> on other cores in the system might be problematic. By rendezvousing
>> all CPUs in NMI handler, it prevents NMI acceptance during ucode
>> loading.
>> 
>> Basically, some work previously done in stop_machine context is
>> moved to NMI handler. Primary threads call in and load ucode in
>> NMI handler. Secondary threads wait for the completion of ucode
>> loading on all CPU cores. An option is introduced to disable this
>> behavior.
>> 
>> Signed-off-by: Chao Gao 
>> Signed-off-by: Sergey Dyasli 
>
>
>
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2056,6 +2056,16 @@ microcode in the cpio name space must be:
>>- on Intel: kernel/x86/microcode/GenuineIntel.bin
>>- on AMD  : kernel/x86/microcode/AuthenticAMD.bin
>>  
>> +### ucode_loading_in_nmi (x86)
>> +> `= `
>> +
>> +> Default: `true`
>> +
>> +When one CPU is loading ucode, handling NMIs on sibling threads or threads 
>> on
>> +other cores might cause problems. By default, all CPUs rendezvous in NMI 
>> handler
>> +and load ucode. This option provides a way to disable it in case of some 
>> CPUs
>> +don't allow ucode loading in NMI handler.
>
>We already have "ucode=", why don't you extend it to allow "ucode=nmi"
>and "ucode=no-nmi"? (In any event, please no underscores in new
>command line options - use hyphens if necessary.)

Ok. Will extend the "ucode" parameter.

>
>> @@ -232,6 +237,7 @@ DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>>   */
>>  static cpumask_t cpu_callin_map;
>>  static atomic_t cpu_out, cpu_updated;
>> +const struct microcode_patch *nmi_patch;
>
>static
>
>> @@ -354,6 +360,50 @@ static void set_state(unsigned int state)
>>  smp_wmb();
>>  }
>>  
>> +static int secondary_thread_work(void)
>> +{
>> +cpumask_set_cpu(smp_processor_id(), _callin_map);
>> +
>> +return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
>> +}
>> +
>> +static int primary_thread_work(const struct microcode_patch *patch)
>
>I think it would be nice if both functions carried "nmi" in their
>names - how about {primary,secondary}_nmi_work()? Or wait - the
>primary one gets used outside of NMI as well, so I'm fine with its
>name.
>The secondary one, otoh, is NMI-specific and also its only
>caller doesn't care about the return value, so I'd suggest making
>it return void alongside adding some form of "nmi" to its name. Or,

Will do.

>perhaps even better, have secondary_thread_fn() call it, moving the
>cpu_sig update here (and of course then there shouldn't be any
>"nmi" added to its name).

Even with "ucode=no-nmi", secondary threads have to do busy-loop in
NMI handling util primary threads completing the update. Otherwise,
it may access MSRs (like SPEC_CTRL) which is considered unsafe.

>
>> +static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>> +{
>> +unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +unsigned int controller = cpumask_first(_online_map);
>> +
>> +/* System-generated NMI, will be ignored */
>> +if ( loading_state != LOADING_CALLIN )
>> +return 0;
>
>I'm not happy at all to see NMIs being ignored. But by returning
>zero, you do _not_ ignore it. Did you perhaps mean "will be ignored
>here", in which case perhaps better "leave to main handler"? And
>for the comment to extend to the other two conditions right below,
>I think it would be better to combine them all into a single if().
>
>Also, throughout the series, I think you want to consistently use
>ACCESS_ONCE() for reads/writes from/to loading_state.
>
>> +if ( cpu == controller || (!opt_ucode_loading_in_nmi && cpu == primary) 
>> )
>> +return 0;
>
>Why not

As I said above, secondary threads are expected to stay in NMI handler
regardless the setting of opt_ucode_loading_in_nmi.

>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -126,6 +126,8 @@ boolean_param("ler", opt_ler);
>>  /* LastExceptionFromIP on this hardware.  Zero if LER is not in use. */
>>  unsigned int __read_mostly ler_msr;
>>  
>> +unsigned int __read_mostly nmi_cpu;
>
>Since this variable (for now) is never written to it should gain a
>comment saying why this is, and perhaps it would then also better be
>const rather than __read_mostly.

How about use the macro below:
#define NMI_CPU 0

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2019-09-13 Thread Chao Gao

On Fri, Sep 13, 2019 at 10:02:24AM +, Spassov, Stanislav wrote:
>On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote:
>>On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
>>>>>> On 13.12.18 at 04:46,  wrote:
>>>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>>>>> On 12.12.18 at 16:18,  wrote:
>>>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>>>>> I find some pass-thru devices don't work any more across guest 
>>>>>>>>>>> reboot.
>>>>>>>>>>> Assigning it to another guest also meets the same issue. And the 
>>>>>>>>>>> only
>>>>>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>>>>>> Someone reported this issue one year ago [1]. More detail also can 
>>>>>>>>>>> be
>>>>>>>>>>> found in [2].
>>>>>>>>>>>
>>>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall 
>>>>>>>>>>> bit
>>>>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal 
>>>>>>>>>>> maskall
>>>>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>>>>> MSI-X message control register.
>>>>>>>>>>>
>>>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting 
>>>>>>>>>>> MSI-X
>>>>>>>>>>> internal state of a device, we employ it to fix this issue rather 
>>>>>>>>>>> than
>>>>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>>>>
>>>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>>>>>> the device's msix and pirq has been created. This limitation 
>>>>>>>>>>> prevents
>>>>>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>>>>
>>>>>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>>>>>> me worry about the raciness of the attach/detach procedure: if 
>>>>>>>>>> there's
>>>>>>>>>> a state where pciback assumes the device has been detached from the
>>>>>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>>>>>> another guest in such state will fail.
>>>>>>>>>
>>>>>>>>>I wonder whether this additional reset functionality could be done out
>>>>>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>>>>>and then do the extra things that are not properly done there.
>>>>>>>> 
>>>>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>>>>>> without error. But ATM, xen expe

Re: [Xen-devel] [PATCH v10 01/16] microcode/intel: extend microcode_update_match()

2019-09-13 Thread Chao Gao

On Fri, Sep 13, 2019 at 08:50:59AM +0200, Jan Beulich wrote:
>On 12.09.2019 12:24, Jan Beulich wrote:
>> On 12.09.2019 09:22, Chao Gao wrote:
>>> --- a/xen/arch/x86/microcode_intel.c
>>> +++ b/xen/arch/x86/microcode_intel.c
>>> @@ -134,21 +134,11 @@ static int collect_cpu_info(unsigned int cpu_num, 
>>> struct cpu_signature *csig)
>>>  return 0;
>>>  }
>>>  
>>> -static inline int microcode_update_match(
>>> -unsigned int cpu_num, const struct microcode_header_intel *mc_header,
>>> -int sig, int pf)
>>> +static int microcode_sanity_check(const void *mc)
>>>  {
>>> -struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu_num);
>>> -
>>> -return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
>>> -(mc_header->rev > uci->cpu_sig.rev));
>>> -}
>>> -
>>> -static int microcode_sanity_check(void *mc)
>>> -{
>>> -struct microcode_header_intel *mc_header = mc;
>>> -struct extended_sigtable *ext_header = NULL;
>>> -struct extended_signature *ext_sig;
>>> +const struct microcode_header_intel *mc_header = mc;
>>> +const struct extended_sigtable *ext_header = NULL;
>>> +const struct extended_signature *ext_sig;
>>>  unsigned long total_size, data_size, ext_table_size;
>>>  unsigned int ext_sigcount = 0, i;
>>>  uint32_t sum, orig_sum;
>>> @@ -234,6 +224,42 @@ static int microcode_sanity_check(void *mc)
>>>  return 0;
>>>  }
>>>  
>>> +/* Check an update against the CPU signature and current update revision */
>>> +static enum microcode_match_result microcode_update_match(
>>> +const struct microcode_header_intel *mc_header, unsigned int cpu)
>>> +{
>>> +const struct extended_sigtable *ext_header;
>>> +const struct extended_signature *ext_sig;
>>> +unsigned int i;
>>> +struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
>>> +unsigned int sig = uci->cpu_sig.sig;
>>> +unsigned int pf = uci->cpu_sig.pf;
>>> +unsigned int rev = uci->cpu_sig.rev;
>>> +unsigned long data_size = get_datasize(mc_header);
>>> +const void *end = (const void *)mc_header + get_totalsize(mc_header);
>>> +
>>> +ASSERT(!microcode_sanity_check(mc_header));
>>> +if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
>>> +return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
>>> +
>>> +ext_header = (const void *)(mc_header + 1) + data_size;
>>> +ext_sig = (const void *)(ext_header + 1);
>>> +
>>> +/*
>>> + * Make sure there is enough space to hold an extended header and 
>>> enough
>>> + * array elements.
>>> + */
>>> +if ( (end < (const void *)ext_sig) ||
>>> + (end < (const void *)(ext_sig + ext_header->count)) )
>>> +return MIS_UCODE;
>> 
>> With you now assuming that the blob has previously passed
>> microcode_sanity_check(), this only needs to be
>> 
>> if ( (end <= (const void *)ext_sig) )
>> return MIS_UCODE;
>> 
>> now afaict.
>> 
>> Reviewed-by: Jan Beulich 
>> preferably with this adjustment (assuming you agree).
>
>FAOD: I'd be happy to make the adjustment while committing, but
>I'd like to have your consent (or you proving me wrong). This
>would, as it looks, allow everything up to patch 8 to go in.

Please go ahead. Thanks

Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 12/16] x86/microcode: Synchronize late microcode loading

2019-09-13 Thread Chao Gao

On Thu, Sep 12, 2019 at 05:32:22PM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -264,38 +336,158 @@ static int microcode_update_cpu(const struct 
>> microcode_patch *patch)
>>  return err;
>>  }
>>  
>> -static long do_microcode_update(void *patch)
>> +static bool wait_for_state(unsigned int state)
>> +{
>> +while ( loading_state != state )
>> +{
>> +if ( state != LOADING_EXIT && loading_state == LOADING_EXIT )
>> +return false;
>
>This is at least somewhat confusing: There's no indication here
>that "loading_state" may change behind the function's back. So
>in general one could be (and I initially was) tempted to suggest
>dropping the apparently redundant left side of the &&. But that
>would end up wrong if the compiler translates the above to two
>separate reads of "loading_state". Therefore I'd like to suggest
>
>static bool wait_for_state(typeof(loading_state) state)
>{
>typeof(loading_state) cur_state;
>
>while ( (cur_state = ACCESS_ONCE(loading_state)) != state )
>{
>if ( cur_state == LOADING_EXIT )
>return false;
>cpu_relax();
>}
>
>return true;
>}
>
>or something substantially similar (if, e.g., you dislike the
>use of typeof() here).

The code snippet above is terrific. Will take it.

>
>> +static int secondary_thread_fn(void)
>> +{
>> +unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +
>> +if ( !wait_for_state(LOADING_CALLIN) )
>> +return -EBUSY;
>> +
>> +cpumask_set_cpu(smp_processor_id(), _callin_map);
>> +
>> +if ( !wait_for_state(LOADING_EXIT) )
>> +return -EBUSY;
>
>This return looks to be unreachable, doesn't it?

Yes. I will use a variable to hold its return value and assert the
return value is always true.

Other comments are reasonable and I will follow your suggestion.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 09/16] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-13 Thread Chao Gao

On Thu, Sep 12, 2019 at 04:07:16PM +0200, Jan Beulich wrote:
>On 12.09.2019 09:22, Chao Gao wrote:
>> @@ -249,49 +249,80 @@ bool microcode_update_cache(struct microcode_patch 
>> *patch)
>>  return true;
>>  }
>>  
>> -static int microcode_update_cpu(const void *buf, size_t size)
>> +/*
>> + * Load a microcode update to current CPU.
>> + *
>> + * If no patch is provided, the cached patch will be loaded. Microcode 
>> update
>> + * during APs bringup and CPU resuming falls into this case.
>> + */
>> +static int microcode_update_cpu(const struct microcode_patch *patch)
>>  {
>> -int err;
>> -unsigned int cpu = smp_processor_id();
>> -struct cpu_signature *sig = _cpu(cpu_sig, cpu);
>> +int err = microcode_ops->collect_cpu_info(_cpu(cpu_sig));
>>  
>> -spin_lock(_mutex);
>> +if ( unlikely(err) )
>> +return err;
>>  
>> -err = microcode_ops->collect_cpu_info(sig);
>> -if ( likely(!err) )
>> -err = microcode_ops->cpu_request_microcode(buf, size);
>> -spin_unlock(_mutex);
>> +if ( patch )
>> +err = microcode_ops->apply_microcode(patch);
>> +else if ( microcode_cache )
>> +{
>> +spin_lock(_mutex);
>> +err = microcode_ops->apply_microcode(microcode_cache);
>> +if ( err == -EIO )
>> +{
>> +microcode_free_patch(microcode_cache);
>> +microcode_cache = NULL;
>> +}
>> +spin_unlock(_mutex);
>> +}
>
>I'm having trouble understanding the locking discipline here: Why
>do you call ->apply_microcode() once with the lock held and once
>without? If this is to guard against microcode_cache changing,

Yes. microcode_cache is protected by microcode_mutex;

>then (a) the check of it being non-NULL would need to be done with
>the lock held as well and

Will do.

>(b) you'd need to explain why the non-
>locked call to ->apply_microcode() is okay.

->apply_microcode() was always called with this lock held was because
it always read the old per-cpu cache which was protected by the lock.
It gave us an impression that ->apply_microcode() was protected by the
lock.

The patch before this one makes ->apply_microcode() accept a patch
pointer. With this change, if the patch being passed should be accessed
with some lock held (like the secondary call site above), we acquire
the lock. Otherwise, no lock is taken and the caller of
microcode_update_cpu() is supposed to guarantee the patch won't be
changed by others.

>
>It certainly wasn't this way in v8, yet the v9 revision log also
>doesn't mention such a (not insignificant) change (which is part
>of the reason why I didn't spot it in v9).

It is my bad.

>
>> +else
>> +/* No patch to update */
>> +err = -ENOENT;
>>  
>>  return err;
>>  }
>>  
>> -static long do_microcode_update(void *_info)
>> +static long do_microcode_update(void *patch)
>>  {
>> -struct microcode_info *info = _info;
>> -int error;
>> -
>> -BUG_ON(info->cpu != smp_processor_id());
>> +unsigned int cpu;
>> +int ret = microcode_update_cpu(patch);
>>  
>> -error = microcode_update_cpu(info->buffer, info->buffer_size);
>> -if ( error )
>> -info->error = error;
>> +/* Store the patch after a successful loading */
>> +if ( !ret && patch )
>> +{
>> +spin_lock(_mutex);
>> +microcode_update_cache(patch);
>> +spin_unlock(_mutex);
>> +patch = NULL;
>> +}
>>  
>>  if ( microcode_ops->end_update_percpu )
>>  microcode_ops->end_update_percpu();
>>  
>> -info->cpu = cpumask_next(info->cpu, _online_map);
>> -if ( info->cpu < nr_cpu_ids )
>> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
>> info);
>> +/*
>> + * Each thread tries to load ucode and only the first thread of a core
>> + * would succeed. Ignore error other than -EIO.
>> + */
>> +if ( ret != -EIO )
>> +ret = 0;
>
>I don't think this is a good idea. Ignoring a _specific_ error
>code (e.g. indicating "already loaded" or "newer patch already
>loaded") is fine, but here you also ignore things like -ENOMEM
>or -EINVAL.

will do.

>
>> +cpu = cpumask_next(smp_processor_id(), _online_map);
>> +if ( cpu < nr_cpu_ids )
>> +return continue_hypercall_on_cpu(cpu, do_microcode

[Xen-devel] [PATCH v10 14/16] microcode: rendezvous CPUs in NMI handler and load ucode

2019-09-12 Thread Chao Gao

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Signed-off-by: Chao Gao 
Signed-off-by: Sergey Dyasli 
---
Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc | 10 +
 xen/arch/x86/microcode.c  | 95 ---
 xen/arch/x86/traps.c  |  6 ++-
 xen/include/asm-x86/nmi.h |  3 ++
 4 files changed, 96 insertions(+), 18 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 7c72e31..3017073 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2056,6 +2056,16 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+### ucode_loading_in_nmi (x86)
+> `= `
+
+> Default: `true`
+
+When one CPU is loading ucode, handling NMIs on sibling threads or threads on
+other cores might cause problems. By default, all CPUs rendezvous in NMI 
handler
+and load ucode. This option provides a way to disable it in case of some CPUs
+don't allow ucode loading in NMI handler.
+
 ### unrestricted_guest (Intel)
 > `= `
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 049eda6..64a4321 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,6 +127,9 @@ static int __init parse_ucode(const char *s)
 }
 custom_param("ucode", parse_ucode);
 
+static bool __read_mostly opt_ucode_loading_in_nmi = true;
+boolean_runtime_param("ucode_loading_in_nmi", opt_ucode_loading_in_nmi);
+
 /*
  * 8MB ought to be enough.
  */
@@ -232,6 +237,7 @@ DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
  */
 static cpumask_t cpu_callin_map;
 static atomic_t cpu_out, cpu_updated;
+const struct microcode_patch *nmi_patch;
 
 /*
  * Return a patch that covers current CPU. If there are multiple patches,
@@ -354,6 +360,50 @@ static void set_state(unsigned int state)
 smp_wmb();
 }
 
+static int secondary_thread_work(void)
+{
+cpumask_set_cpu(smp_processor_id(), _callin_map);
+
+return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
+}
+
+static int primary_thread_work(const struct microcode_patch *patch)
+{
+int ret;
+
+cpumask_set_cpu(smp_processor_id(), _callin_map);
+
+if ( !wait_for_state(LOADING_ENTER) )
+return -EBUSY;
+
+ret = microcode_ops->apply_microcode(patch);
+if ( !ret )
+atomic_inc(_updated);
+atomic_inc(_out);
+
+return ret;
+}
+
+static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
+unsigned int controller = cpumask_first(_online_map);
+
+/* System-generated NMI, will be ignored */
+if ( loading_state != LOADING_CALLIN )
+return 0;
+
+if ( cpu == controller || (!opt_ucode_loading_in_nmi && cpu == primary) )
+return 0;
+
+if ( cpu == primary )
+primary_thread_work(nmi_patch);
+else
+secondary_thread_work();
+
+return 0;
+}
+
 static int secondary_thread_fn(void)
 {
 unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
@@ -361,10 +411,7 @@ static int secondary_thread_fn(void)
 if ( !wait_for_state(LOADING_CALLIN) )
 return -EBUSY;
 
-cpumask_set_cpu(smp_processor_id(), _callin_map);
-
-if ( !wait_for_state(LOADING_EXIT) )
-return -EBUSY;
+self_nmi();
 
 /* Copy update revision from the primary thread. */
 this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
@@ -379,15 +426,10 @@ static int primary_thread_fn(const struct microcode_

[Xen-devel] [PATCH v10 00/16] improve late microcode loading

2019-09-12 Thread Chao Gao

Major changes in version 10:
 - add back the patch to call wbinvd() conditionally
 - add a patch to disable late loading due to BDF90
 - rendezvous CPUs in NMI handler and load ucode. But provide an option
 to disable this behavior.
 - avoid the call of self_nmi() on the control thread because it may
 trigger the unknown_nmi_error() in do_nmi().
 - ensure ->start_update is called during system resuming from
 suspension

Sergey, could you help to test this series on an AMD machine?
Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware. At least, two basic tests are needed:
* do a microcode update after system bootup
* don't bring all pCPUs up at bootup by specifying maxcpus option in xen
  command line and then do a microcode update and online all offlined
  CPUs via 'xen-hptool'.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 12 more details).

This series includes below changes:
 1. Patch 1-11: introduce a global microcode cache and some cleanup
 2. Patch 12: synchronize late microcode loading
 3. Patch 13: support parallel microcodes update on different cores
 4. Patch 14: block #NMI handling during microcode loading
 5. Patch 15: disable late ucode loading due to BDF90
 6. Patch 16: call wbinvd() conditionally

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Previous change log:
Changes in version 9:
 - add Jan's Reviewed-by
 - rendevzous threads in NMI handler to disable NMI. Note that NMI can
 be served as usual on threads that are chosen to initiate ucode loading
 on each core.
 - avoid unnecessary memory allocation or copy when creating a microcode
 patch (patch 12)
 - rework patch 1 to avoid microcode_update_match() being used to
 compare two arbitrary updates.
 - call .end_update in early loading path.

Changes in version 8:
 - block #NMI handling during microcode loading (Patch 16)
 - Don't assume that all CPUs in the system have loaded a same ucode.
 So when parsing a blob, we attempt to save a patch as long as it matches
 with current cpu signature regardless of the revision of the patch.
 And also for loading, we only require the patch to be loaded isn't old
 than the cached one.
 - store an update after the first successful loading on a CPU
 - remove the patch that calls wbinvd() unconditionally before microcode
 loading. It is under internal discussion.
 - divide two big patches into several patches to improve readability.

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.

Chao Gao (16):
  microcode/intel: extend microcode_update_match()
  microcode/amd: distinguish old and mismatched ucode in
microcode_fits()
  microcode: introduce a global cache of ucode patch
  microcode: clean up microcode_resume_cpu
  microcode: remove struct ucode_cpu_info
  microcode: remove pointless 'cpu' parameter
  microcode/amd: c

[Xen-devel] [PATCH v10 08/16] microcode: pass a patch pointer to apply_microcode()

2019-09-12 Thread Chao Gao

apply_microcode()'s always loading the cached ucode patch forces
a patch to be stored before being loaded. Make apply_microcode()
accept a patch pointer to remove the limitation so that a patch
can be stored after a successful loading.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/microcode.c| 2 +-
 xen/arch/x86/microcode_amd.c| 5 ++---
 xen/arch/x86/microcode_intel.c  | 5 ++---
 xen/include/asm-x86/microcode.h | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 5c82a2d..b44e4d7 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -208,7 +208,7 @@ int microcode_resume_cpu(void)
 
 err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode();
+err = microcode_ops->apply_microcode(microcode_cache);
 spin_unlock(_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index c96a3b3..c6d2ea3 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -253,7 +253,7 @@ static enum microcode_match_result compare_patch(
 return MIS_UCODE;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint32_t rev;
@@ -261,7 +261,6 @@ static int apply_microcode(void)
 unsigned int cpu = smp_processor_id();
 struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 const struct microcode_header_amd *hdr;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !patch )
 return -ENOENT;
@@ -565,7 +564,7 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 
 if ( match_cpu(microcode_get_cache()) )
 {
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 if ( error )
 break;
 }
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 5f1ae2f..b1ec81d 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -323,7 +323,7 @@ static int get_matching_microcode(const void *mc)
 return 1;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint64_t msr_content;
@@ -331,7 +331,6 @@ static int apply_microcode(void)
 unsigned int cpu_num = raw_smp_processor_id();
 struct cpu_signature *sig = _cpu(cpu_sig);
 const struct microcode_intel *mc_intel;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !patch )
 return -ENOENT;
@@ -429,7 +428,7 @@ static int cpu_request_microcode(const void *buf, size_t 
size)
 error = offset;
 
 if ( !error && match_cpu(microcode_get_cache()) )
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 
 return error;
 }
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index b0eee0e..02feb09 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -22,7 +22,7 @@ struct microcode_patch {
 struct microcode_ops {
 int (*cpu_request_microcode)(const void *buf, size_t size);
 int (*collect_cpu_info)(struct cpu_signature *csig);
-int (*apply_microcode)(void);
+int (*apply_microcode)(const struct microcode_patch *patch);
 int (*start_update)(void);
 void (*end_update_percpu)(void);
 void (*free_patch)(void *mc);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 09/16] microcode: split out apply_microcode() from cpu_request_microcode()

2019-09-12 Thread Chao Gao

During late microcode loading, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). In general, for both
early loading on BSP and late loading, cpu_request_microcode() is
called first to get the matching microcode update contained by
the blob and then apply_microcode() is invoked explicitly on each
cpu in common code.

Given that all CPUs are supposed to have the same signature, parsing
microcode only needs to be done once. So cpu_request_microcode() is
also moved out of microcode_update_cpu().

In some cases (e.g. a broken bios), the system may have multiple
revisions of microcode update. So we would try to load a microcode
update as long as it covers current cpu. And if a cpu loads this patch
successfully, the patch would be stored into the patch cache.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - make microcode_update_cache static
 - raise an error if loading ucode failed with -EIO
 - ensure end_update_percpu() is called following a successful call of
 start_update()

Changes in v9:
 - remove the calling of ->compare_patch in microcode_update_cpu().
 - drop "microcode_" prefix for static function - microcode_parse_blob().
 - rebase and fix conflict

Changes in v8:
 - divide the original patch into three patches to improve readability
 - load an update on each cpu as long as the update covers current cpu
 - store an update after the first successful loading on a CPU
 - Make sure the current CPU (especially pf value) is covered
 by updates.

changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
---
 xen/arch/x86/microcode.c| 182 +++-
 xen/arch/x86/microcode_amd.c|  38 +
 xen/arch/x86/microcode_intel.c  |  66 +++
 xen/include/asm-x86/microcode.h |   5 +-
 4 files changed, 178 insertions(+), 113 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index b44e4d7..d4738f6 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,12 +189,19 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return a patch that covers current CPU. If there are multiple patches,
+ * return the one with the highest revision number. Return error If no
+ * patch is found and an error occurs during the parsing process. Otherwise
+ * return NULL.
+ */
+static struct microcode_patch *parse_blob(const char *buf, size_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
+
+return NULL;
+}
 
 int microcode_resume_cpu(void)
 {
@@ -220,15 +227,8 @@ void microcode_free_patch(struct microcode_patch 
*microcode_patch)
 xfree(microcode_patch);
 }
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(_mutex));
-
-return microcode_cache;
-}
-
 /* Return true if cache gets updated. Otherwise, return false */
-bool microcode_update_cache(struct microcode_patch *patch)
+static bool microcode_update_cache(struct microcode_patch *patch)
 {
 ASSERT(spin_is_locked(_mutex));
 
@@ -249,49 +249,80 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static int microcode_update_cpu(const void *buf, size_t size)
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(const struct microcode_patch *patch)
 {
-int err;
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
+int err = microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
-spin_lock(_mutex);
+if ( unlikely(err) )
+return err;
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(buf, size);
-spin_unlock(_mutex);
+if ( patch )
+err = microcode_ops->apply_microcode(patch);
+else if ( microcode_cache )
+{
+spin_lock(_mutex);
+err = microcode_ops->apply_microcode(microcode_cache);
+if ( err == -EIO )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = NULL;
+}
+spin_unlock(_mutex);
+}
+else
+/* No patch to update */
+err = -ENOENT;
 
 return err;
 }
 
-static long do_microcode_update(void *_info)
+static long do_microcode_update(void *patch)
 {
-struct micr

[Xen-devel] [PATCH v10 16/16] microcode/intel: writeback and invalidate cache conditionally

2019-09-12 Thread Chao Gao

It is needed to mitigate some issues on this specific Broadwell CPU.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/microcode_intel.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index bcef668..4e5e7f9 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -305,6 +305,31 @@ static bool is_blacklisted(void)
 return false;
 }
 
+static void microcode_quirk(void)
+{
+struct cpuinfo_x86 *c;
+uint64_t llc_size;
+
+/*
+ * Don't refer to current_cpu_data, which isn't fully initialized
+ * before this stage.
+ */
+if ( system_state < SYS_STATE_smp_boot )
+return;
+
+c = _cpu_data;
+llc_size = c->x86_cache_size * 1024ULL;
+do_div(llc_size, c->x86_max_cores);
+
+/*
+ * To mitigate some issues on this specific Broadwell CPU, writeback and
+ * invalidate cache regardless of ucode revision.
+ */
+if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
+ llc_size > 2621440 )
+wbinvd();
+}
+
 static int apply_microcode(const struct microcode_patch *patch)
 {
 uint64_t msr_content;
@@ -323,6 +348,8 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 BUG_ON(local_irq_is_enabled());
 
+microcode_quirk();
+
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
 wrmsrl(MSR_IA32_UCODE_REV, 0x0ULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 10/16] microcode: unify ucode loading during system bootup and resuming

2019-09-12 Thread Chao Gao

During system bootup and resuming, CPUs just load the cached ucode.
So one unified function microcode_update_one() is introduced. It
takes a boolean to indicate whether ->start_update should be called.
Since early_microcode_update_cpu() is only called on BSP (APs call
the unified function), start_update is always true and so remove
this parameter.

There is a functional change: ->start_update is called on BSP and
->end_update_percpu is called during system resuming. They are not
invoked by previous microcode_resume_cpu().

Signed-off-by: Chao Gao 
---
Changes in v10:
 - call ->start_update for system resume from suspension

Changes in v9:
 - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
   microcode_update_one()
 - rebase and fix conflicts.

Changes in v8:
 - split out from the previous patch
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 91 +++--
 xen/arch/x86/smpboot.c  |  5 +--
 xen/include/asm-x86/processor.h |  4 +-
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 269b140..01e6aec 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+microcode_update_one(true);
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index d4738f6..c2ea20f 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char *buf, 
size_t len)
 return NULL;
 }
 
-int microcode_resume_cpu(void)
-{
-int err;
-struct cpu_signature *sig = _cpu(cpu_sig);
-
-if ( !microcode_ops )
-return 0;
-
-spin_lock(_mutex);
-
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode(microcode_cache);
-spin_unlock(_mutex);
-
-return err;
-}
-
 void microcode_free_patch(struct microcode_patch *microcode_patch)
 {
 microcode_ops->free_patch(microcode_patch->mc);
@@ -394,11 +376,38 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-int __init early_microcode_update_cpu(bool start_update)
+/* Load a cached update to current cpu */
+int microcode_update_one(bool start_update)
+{
+int err;
+
+if ( !microcode_ops )
+return -EOPNOTSUPP;
+
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
+
+if ( start_update && microcode_ops->start_update )
+{
+err = microcode_ops->start_update();
+if ( err )
+return err;
+}
+
+err = microcode_update_cpu(NULL);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return err;
+}
+
+/* BSP calls this function to parse ucode blob and then apply an update. */
+int __init early_microcode_update_cpu(void)
 {
 int rc = 0;
 void *data = NULL;
 size_t len;
+struct microcode_patch *patch;
 
 if ( !microcode_ops )
 return -ENOSYS;
@@ -414,44 +423,26 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(_cpu(cpu_sig));
-
 if ( !data )
 return -ENOMEM;
 
-if ( start_update )
+patch = parse_blob(data, len);
+if ( IS_ERR(patch) )
 {
-struct microcode_patch *patch;
-
-patch = parse_blob(data, len);
-if ( IS_ERR(patch) )
-{
-printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
-   PTR_ERR(patch));
-return PTR_ERR(patch);
-}
-
-if ( !patch )
-return -ENOENT;
-
-spin_lock(_mutex);
-rc = microcode_update_cache(patch);
-spin_unlock(_mutex);
-ASSERT(rc);
-
-if ( microcode_ops->start_update )
-rc = microcode_ops->start_update();
-
-if ( rc )
-return rc;
+printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
+   PTR_ERR(patch));
+return PTR_ERR(patch);
 }
 
-rc = microcode_update_cpu(NULL);
+if ( !patch )
+return -ENOENT;
 
-if ( microcode_ops->end_update_percpu )
-microcode_ops->end_update_percpu();
+spin_lock(_mutex);
+rc = microcode_update_cache(patch);
+spin_unlock(_mutex);
+ASSERT(rc);
 
-return rc;
+return microcode_update_one(true);
 }
 
 int __init early_microcode_init(void)
@@ -471,7 +462,7 @@ int __init early_microcode_init(void)
 microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
-rc = early_microcode_update_cpu(true);
+rc = early_

[Xen-devel] [PATCH v10 05/16] microcode: remove struct ucode_cpu_info

2019-09-12 Thread Chao Gao

Remove the per-cpu cache field in struct ucode_cpu_info since it has
been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.

The cpu status notifier is also removed. It was used to free the "mc"
field to avoid memory leak.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - rebase and fix conflict

Changes in v8:
 - split microcode_resume_cpu() cleanup to a separate patch.

Changes in v6:
 - remove the whole struct ucode_cpu_info instead of the per-cpu cache
  in it.
---
 xen/arch/x86/apic.c |  2 +-
 xen/arch/x86/microcode.c| 57 +++
 xen/arch/x86/microcode_amd.c| 59 +
 xen/arch/x86/microcode_intel.c  | 28 +++
 xen/arch/x86/spec_ctrl.c|  2 +-
 xen/include/asm-x86/microcode.h | 12 +
 6 files changed, 34 insertions(+), 126 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index ea0d561..6cdb50c 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1190,7 +1190,7 @@ static void __init check_deadline_errata(void)
 else
 rev = (unsigned long)m->driver_data;
 
-if ( this_cpu(ucode_cpu_info).cpu_sig.rev >= rev )
+if ( this_cpu(cpu_sig).rev >= rev )
 return;
 
 setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE);
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 922b94f..d17dbec 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -187,7 +187,7 @@ const struct microcode_ops *microcode_ops;
 
 static DEFINE_SPINLOCK(microcode_mutex);
 
-DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
+DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 struct microcode_info {
 unsigned int cpu;
@@ -196,32 +196,17 @@ struct microcode_info {
 char buffer[1];
 };
 
-static void __microcode_fini_cpu(unsigned int cpu)
-{
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-
-xfree(uci->mc.mc_valid);
-memset(uci, 0, sizeof(*uci));
-}
-
-static void microcode_fini_cpu(unsigned int cpu)
-{
-spin_lock(_mutex);
-__microcode_fini_cpu(cpu);
-spin_unlock(_mutex);
-}
-
 int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->apply_microcode(cpu);
 spin_unlock(_mutex);
@@ -268,16 +253,13 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 {
 int err;
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->cpu_request_microcode(cpu, buf, size);
-else
-__microcode_fini_cpu(cpu);
-
 spin_unlock(_mutex);
 
 return err;
@@ -364,29 +346,10 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-static int microcode_percpu_callback(
-struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-unsigned int cpu = (unsigned long)hcpu;
-
-switch ( action )
-{
-case CPU_DEAD:
-microcode_fini_cpu(cpu);
-break;
-}
-
-return NOTIFY_DONE;
-}
-
-static struct notifier_block microcode_percpu_nfb = {
-.notifier_call = microcode_percpu_callback,
-};
-
 int __init early_microcode_update_cpu(bool start_update)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -405,7 +368,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+microcode_ops->collect_cpu_info(cpu, sig);
 
 if ( data )
 {
@@ -424,7 +387,7 @@ int __init early_microcode_update_cpu(bool start_update)
 int __init early_microcode_init(void)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -437,12 +400,10 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+microcode_ops->collect_cpu_info(cpu, sig);
 
 if ( ucode_mod.mod_end |

[Xen-devel] [PATCH v10 13/16] microcode: remove microcode_update_lock

2019-09-12 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index f05db72..856caea 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -232,7 +229,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -247,15 +243,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 4e811b7..19f1ba0 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -288,7 +285,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -303,8 +299,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
@@ -317,7 +312,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 02/16] microcode/amd: distinguish old and mismatched ucode in microcode_fits()

2019-09-12 Thread Chao Gao

Sometimes, an ucode with a level lower than or equal to current CPU's
patch level is useful. For example, to work around a broken bios which
only loads ucode for BSP, when BSP parses an ucode blob during bootup,
it is better to save an ucode with lower or equal level for APs

No functional change is made in this patch. But following patch would
handle "old ucode" and "mismatched ucode" separately.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v8:
 - new
---
 xen/arch/x86/microcode_amd.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 9b74330..7fa700b 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -152,8 +152,8 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 return 0;
 }
 
-static bool_t microcode_fits(const struct microcode_amd *mc_amd,
- unsigned int cpu)
+static enum microcode_match_result microcode_fits(
+const struct microcode_amd *mc_amd, unsigned int cpu)
 {
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
@@ -167,27 +167,27 @@ static bool_t microcode_fits(const struct microcode_amd 
*mc_amd,
 current_cpu_id = cpuid_eax(0x0001);
 
 if ( !find_equiv_cpu_id(equiv_cpu_table, current_cpu_id, _cpu_id) )
-return 0;
+return MIS_UCODE;
 
 if ( (mc_header->processor_rev_id) != equiv_cpu_id )
-return 0;
+return MIS_UCODE;
 
 if ( !verify_patch_size(mc_amd->mpb_size) )
 {
 pr_debug("microcode: patch size mismatch\n");
-return 0;
+return MIS_UCODE;
 }
 
 if ( mc_header->patch_id <= uci->cpu_sig.rev )
 {
 pr_debug("microcode: patch is already at required level or 
greater.\n");
-return 0;
+return OLD_UCODE;
 }
 
 pr_debug("microcode: CPU%d found a matching microcode update with version 
%#x (current=%#x)\n",
  cpu, mc_header->patch_id, uci->cpu_sig.rev);
 
-return 1;
+return NEW_UCODE;
 }
 
 static int apply_microcode(unsigned int cpu)
@@ -496,7 +496,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
)) == 0 )
 {
-if ( microcode_fits(mc_amd, cpu) )
+if ( microcode_fits(mc_amd, cpu) == NEW_UCODE )
 {
 error = apply_microcode(cpu);
 if ( error )
@@ -579,7 +579,7 @@ static int microcode_resume_match(unsigned int cpu, const 
void *mc)
 struct microcode_amd *mc_amd = uci->mc.mc_amd;
 const struct microcode_amd *src = mc;
 
-if ( !microcode_fits(src, cpu) )
+if ( microcode_fits(src, cpu) != NEW_UCODE )
 return 0;
 
 if ( src != mc_amd )
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 03/16] microcode: introduce a global cache of ucode patch

2019-09-12 Thread Chao Gao

to replace the current per-cpu cache 'uci->mc'.

With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. Having differing microcode revisions
on cpus would cause system unstable and should be avoided. Hence, caching
one microcode update is good enough for all cases.

Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And
this global cache is loaded to cpu in apply_microcode().

All operations on the cache is protected by 'microcode_mutex'.

Note that I deliberately avoid touching the old per-cpu cache ('uci->mc')
as I am going to remove it completely in the following patches. We copy
everything to create the new cache blob to avoid reusing some buffers
previously allocated for the old per-cpu cache. It is not so efficient,
but it is already corrected by a patch later in this series.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - assert mismatched ucode won't be passed to ->compare_patch.
 - return -ENOENT if patch is NULL in .apply_microcode().
 - check against NULL pointer dereference in free_patch() on AMD side
 - cosmetic changes suggested by Roger and Jan.

Changes in v9:
 - on Intel side, ->compare_patch just checks the patch revision number.
 - explain why all buffers are copied in alloc_microcode_patch() in
 patch description.

Changes in v8:
 - Free generic wrapper struct in general code
 - Try to update cache as long as a patch covers current cpu. Previsouly,
 cache is updated only if the patch is newer than current update revision in
 the CPU. The small difference can work around a broken bios which only
 applies microcode update to BSP and software has to apply the same
 update to other CPUs.

Changes in v7:
 - reworked to cache only one microcode patch rather than a list of
 microcode patches.
---
 xen/arch/x86/microcode.c| 38 
 xen/arch/x86/microcode_amd.c| 98 ++---
 xen/arch/x86/microcode_intel.c  | 81 +++---
 xen/include/asm-x86/microcode.h | 16 +++
 4 files changed, 211 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 421d57e..e218a9d 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -61,6 +61,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* Protected by microcode_mutex */
+static struct microcode_patch *microcode_cache;
+
 void __init microcode_set_module(unsigned int idx)
 {
 ucode_mod_idx = idx;
@@ -262,6 +265,41 @@ int microcode_resume_cpu(unsigned int cpu)
 return err;
 }
 
+void microcode_free_patch(struct microcode_patch *microcode_patch)
+{
+microcode_ops->free_patch(microcode_patch->mc);
+xfree(microcode_patch);
+}
+
+const struct microcode_patch *microcode_get_cache(void)
+{
+ASSERT(spin_is_locked(_mutex));
+
+return microcode_cache;
+}
+
+/* Return true if cache gets updated. Otherwise, return false */
+bool microcode_update_cache(struct microcode_patch *patch)
+{
+ASSERT(spin_is_locked(_mutex));
+
+if ( !microcode_cache )
+microcode_cache = patch;
+else if ( microcode_ops->compare_patch(patch,
+   microcode_cache) == NEW_UCODE )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = patch;
+}
+else
+{
+microcode_free_patch(patch);
+return false;
+}
+
+return true;
+}
+
 static int microcode_update_cpu(const void *buf, size_t size)
 {
 int err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 7fa700b..2dca1df 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -190,25 +190,92 @@ static enum microcode_match_result microcode_fits(
 return NEW_UCODE;
 }
 
+static bool match_cpu(const struct microcode_patch *patch)
+{
+if ( !patch )
+return false;
+return microcode_fits(patch->mc_amd, smp_processor_id()) == NEW_UCODE;
+}
+
+static struct microcode_patch *alloc_microcode_patch(
+const struct microcode_amd *mc_amd)
+{
+struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
+struct microcode_amd *cache = xmalloc(struct microcode_amd);
+void *mpb = xmalloc_bytes(mc_amd->mpb_size);
+struct equiv_cpu_entry *equiv_cpu_table =
+xmalloc_bytes(mc_amd->equiv_cpu_table_size);
+
+if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
+{
+xfree(microcode_patch);
+xfree(cache);
+xfree(mpb);
+xfree(equiv_cpu_table);
+retur

[Xen-devel] [PATCH v10 15/16] microcode: disable late loading if CPUs are affected by BDF90

2019-09-12 Thread Chao Gao

It ports the implementation of is_blacklisted() in linux kernel
to Xen.

Late loading may cause system hang if CPUs are affected by BDF90.
Check against BDF90 before performing a late loading.

Signed-off-by: Chao Gao 
---
 xen/arch/x86/microcode.c|  6 ++
 xen/arch/x86/microcode_intel.c  | 23 +++
 xen/include/asm-x86/microcode.h |  1 +
 3 files changed, 30 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 64a4321..dbd2730 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -561,6 +561,12 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 if ( microcode_ops == NULL )
 return -EINVAL;
 
+if ( microcode_ops->is_blacklisted && microcode_ops->is_blacklisted() )
+{
+printk(XENLOG_WARNING "Late ucode loading is disabled!\n");
+return -EPERM;
+}
+
 buffer = xmalloc_bytes(len);
 if ( !buffer )
 return -ENOMEM;
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 19f1ba0..bcef668 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -283,6 +284,27 @@ static enum microcode_match_result compare_patch(
  : OLD_UCODE;
 }
 
+static bool is_blacklisted(void)
+{
+struct cpuinfo_x86 *c = _cpu_data;
+uint64_t llc_size = c->x86_cache_size * 1024ULL;
+struct cpu_signature *sig = _cpu(cpu_sig);
+
+do_div(llc_size, c->x86_max_cores);
+
+/*
+ * Late loading on model 79 with microcode revision less than 0x0b21
+ * and LLC size per core bigger than 2.5MB may result in a system hang.
+ * This behavior is documented in item BDF90, #334165 (Intel Xeon
+ * Processor E7-8800/4800 v4 Product Family).
+ */
+if ( c->x86 == 6 && c->x86_model == 0x4F && c->x86_mask == 0x1 &&
+ llc_size > 2621440 && sig->rev < 0x0b21 )
+return true;
+
+return false;
+}
+
 static int apply_microcode(const struct microcode_patch *patch)
 {
 uint64_t msr_content;
@@ -415,6 +437,7 @@ static const struct microcode_ops microcode_intel_ops = {
 .free_patch   = free_patch,
 .compare_patch= compare_patch,
 .match_cpu= match_cpu,
+.is_blacklisted   = is_blacklisted,
 };
 
 int __init microcode_init_intel(void)
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 7d5a1f8..9ffd9d2 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -30,6 +30,7 @@ struct microcode_ops {
 bool (*match_cpu)(const struct microcode_patch *patch);
 enum microcode_match_result (*compare_patch)(
 const struct microcode_patch *new, const struct microcode_patch *old);
+bool (*is_blacklisted)(void);
 };
 
 struct cpu_signature {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 07/16] microcode/amd: call svm_host_osvw_init() in common code

2019-09-12 Thread Chao Gao

Introduce a vendor hook, .end_update_percpu, for svm_host_osvw_init().
The hook function is called on each cpu after loading an update.
It is a preparation for spliting out apply_microcode() from
cpu_request_microcode().

Note that svm_host_osvm_init() should be called regardless of the
result of loading an update.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - rename end_update to end_update_percpu.
 - use #ifdef rather than #if and frame the implementation with

Changes in v9:
 - call .end_update in early loading path
 - on AMD side, initialize .{start,end}_update only if "CONFIG_HVM"
 is true.
---
 xen/arch/x86/microcode.c| 10 +-
 xen/arch/x86/microcode_amd.c| 25 -
 xen/include/asm-x86/microcode.h |  1 +
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 89a8d2b..5c82a2d 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -276,6 +276,9 @@ static long do_microcode_update(void *_info)
 if ( error )
 info->error = error;
 
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
 info->cpu = cpumask_next(info->cpu, _online_map);
 if ( info->cpu < nr_cpu_ids )
 return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
@@ -376,7 +379,12 @@ int __init early_microcode_update_cpu(bool start_update)
 if ( rc )
 return rc;
 
-return microcode_update_cpu(data, len);
+rc = microcode_update_cpu(data, len);
+
+if ( microcode_ops->end_update_percpu )
+microcode_ops->end_update_percpu();
+
+return rc;
 }
 else
 return -ENOMEM;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 1d27c71..c96a3b3 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -600,10 +600,6 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 free_patch(mc_amd);
 
   out:
-#if CONFIG_HVM
-svm_host_osvw_init();
-#endif
-
 /*
  * In some cases we may return an error even if processor's microcode has
  * been updated. For example, the first patch in a container file is loaded
@@ -613,29 +609,32 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 return error;
 }
 
+#ifdef CONFIG_HVM
 static int start_update(void)
 {
-#if CONFIG_HVM
 /*
- * We assume here that svm_host_osvw_init() will be called on each cpu 
(from
- * cpu_request_microcode()).
- *
- * Note that if collect_cpu_info() returns an error then
- * cpu_request_microcode() will not invoked thus leaving OSVW bits not
- * updated. Currently though collect_cpu_info() will not fail on processors
- * supporting OSVW so we will not deal with this possibility.
+ * svm_host_osvw_init() will be called on each cpu by calling '.end_update'
+ * in common code.
  */
 svm_host_osvw_reset();
-#endif
 
 return 0;
 }
 
+static void end_update_percpu(void)
+{
+svm_host_osvw_init();
+}
+#endif
+
 static const struct microcode_ops microcode_amd_ops = {
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
+#ifdef CONFIG_HVM
 .start_update = start_update,
+.end_update_percpu= end_update_percpu,
+#endif
 .free_patch   = free_patch,
 .compare_patch= compare_patch,
 .match_cpu= match_cpu,
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index f2a5ea4..b0eee0e 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -24,6 +24,7 @@ struct microcode_ops {
 int (*collect_cpu_info)(struct cpu_signature *csig);
 int (*apply_microcode)(void);
 int (*start_update)(void);
+void (*end_update_percpu)(void);
 void (*free_patch)(void *mc);
 bool (*match_cpu)(const struct microcode_patch *patch);
 enum microcode_match_result (*compare_patch)(
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 11/16] microcode: reduce memory allocation and copy when creating a patch

2019-09-12 Thread Chao Gao

To create a microcode patch from a vendor-specific update,
allocate_microcode_patch() copied everything from the update.
It is not efficient. Essentially, we just need to go through
ucodes in the blob, find the one with the newest revision and
install it into the microcode_patch. In the process, buffers
like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
side) can be reused. microcode_patch now is allocated after
it is sure that there is a matching ucode.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
---
Changes in v10:
 - avoid unnecessary type casting
   * introduce compare_header on AMD side
   * specify the type of the first parameter of get_next_ucode_from_buffer()
 on Intel side

Changes in v9:
 - new
---
 xen/arch/x86/microcode_amd.c   | 112 +
 xen/arch/x86/microcode_intel.c |  67 +---
 2 files changed, 69 insertions(+), 110 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 1d1bea4..f05db72 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -194,36 +194,6 @@ static bool match_cpu(const struct microcode_patch *patch)
 return patch && (microcode_fits(patch->mc_amd) == NEW_UCODE);
 }
 
-static struct microcode_patch *alloc_microcode_patch(
-const struct microcode_amd *mc_amd)
-{
-struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
-struct microcode_amd *cache = xmalloc(struct microcode_amd);
-void *mpb = xmalloc_bytes(mc_amd->mpb_size);
-struct equiv_cpu_entry *equiv_cpu_table =
-xmalloc_bytes(mc_amd->equiv_cpu_table_size);
-
-if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
-{
-xfree(microcode_patch);
-xfree(cache);
-xfree(mpb);
-xfree(equiv_cpu_table);
-return ERR_PTR(-ENOMEM);
-}
-
-memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
-cache->mpb = mpb;
-cache->mpb_size = mc_amd->mpb_size;
-memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
-   mc_amd->equiv_cpu_table_size);
-cache->equiv_cpu_table = equiv_cpu_table;
-cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
-microcode_patch->mc_amd = cache;
-
-return microcode_patch;
-}
-
 static void free_patch(void *mc)
 {
 struct microcode_amd *mc_amd = mc;
@@ -236,6 +206,17 @@ static void free_patch(void *mc)
 }
 }
 
+static enum microcode_match_result compare_header(
+const struct microcode_header_amd *new_header,
+const struct microcode_header_amd *old_header)
+{
+if ( new_header->processor_rev_id == old_header->processor_rev_id )
+return (new_header->patch_id > old_header->patch_id) ? NEW_UCODE
+ : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 static enum microcode_match_result compare_patch(
 const struct microcode_patch *new, const struct microcode_patch *old)
 {
@@ -246,11 +227,7 @@ static enum microcode_match_result compare_patch(
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 ASSERT(microcode_fits(new->mc_amd) != MIS_UCODE);
 
-if ( new_header->processor_rev_id == old_header->processor_rev_id )
-return (new_header->patch_id > old_header->patch_id) ?
-NEW_UCODE : OLD_UCODE;
-
-return MIS_UCODE;
+return compare_header(new_header, old_header);
 }
 
 static int apply_microcode(const struct microcode_patch *patch)
@@ -328,18 +305,10 @@ static int get_ucode_from_buffer_amd(
 return -EINVAL;
 }
 
-if ( mc_amd->mpb_size < mpbuf->len )
-{
-if ( mc_amd->mpb )
-{
-xfree(mc_amd->mpb);
-mc_amd->mpb_size = 0;
-}
-mc_amd->mpb = xmalloc_bytes(mpbuf->len);
-if ( mc_amd->mpb == NULL )
-return -ENOMEM;
-mc_amd->mpb_size = mpbuf->len;
-}
+mc_amd->mpb = xmalloc_bytes(mpbuf->len);
+if ( !mc_amd->mpb )
+return -ENOMEM;
+mc_amd->mpb_size = mpbuf->len;
 memcpy(mc_amd->mpb, mpbuf->data, mpbuf->len);
 
 pr_debug("microcode: CPU%d size %zu, block size %u offset %zu equivID %#x 
rev %#x\n",
@@ -459,8 +428,9 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
  size_t bufsize)
 {
 struct microcode_amd *mc_amd;
+struct microcode_header_amd *saved = NULL;
 struct microcode_patch *patch = NULL;
-size_t offset = 0;
+size_t offset = 0, saved_size = 0;
 int error = 0;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
@@ -550,29 +520,22 @@ static struct microcode_patch 
*cpu_request_microcode(const void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,

[Xen-devel] [PATCH v10 04/16] microcode: clean up microcode_resume_cpu

2019-09-12 Thread Chao Gao

Previously, a per-cpu ucode cache is maintained. Then each CPU had one
per-cpu update cache and there might be multiple versions of microcode.
Thus microcode_resume_cpu tried best to update microcode by loading
every update cache until a successful load.

But now the cache struct is simplified a lot and only a single ucode is
cached. a single invocation of ->apply_microcode() would load the cache
and make microcode updated.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
changes in v8:
 - new
 - separated from the following patch
---
 xen/arch/x86/microcode.c| 40 ++-
 xen/arch/x86/microcode_amd.c| 47 -
 xen/arch/x86/microcode_intel.c  |  6 --
 xen/include/asm-x86/microcode.h |  1 -
 4 files changed, 2 insertions(+), 92 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index e218a9d..922b94f 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -215,8 +215,6 @@ int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-struct cpu_signature nsig;
-unsigned int cpu2;
 
 if ( !microcode_ops )
 return 0;
@@ -224,42 +222,8 @@ int microcode_resume_cpu(unsigned int cpu)
 spin_lock(_mutex);
 
 err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
-if ( err )
-{
-__microcode_fini_cpu(cpu);
-spin_unlock(_mutex);
-return err;
-}
-
-if ( uci->mc.mc_valid )
-{
-err = microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid);
-if ( err >= 0 )
-{
-if ( err )
-err = microcode_ops->apply_microcode(cpu);
-spin_unlock(_mutex);
-return err;
-}
-}
-
-nsig = uci->cpu_sig;
-__microcode_fini_cpu(cpu);
-uci->cpu_sig = nsig;
-
-err = -EIO;
-for_each_online_cpu ( cpu2 )
-{
-uci = _cpu(ucode_cpu_info, cpu2);
-if ( uci->mc.mc_valid &&
- microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid) > 0 )
-{
-err = microcode_ops->apply_microcode(cpu);
-break;
-}
-}
-
-__microcode_fini_cpu(cpu);
+if ( likely(!err) )
+err = microcode_ops->apply_microcode(cpu);
 spin_unlock(_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 2dca1df..04b00aa 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -654,52 +654,6 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-struct microcode_amd *mc_amd = uci->mc.mc_amd;
-const struct microcode_amd *src = mc;
-
-if ( microcode_fits(src, cpu) != NEW_UCODE )
-return 0;
-
-if ( src != mc_amd )
-{
-if ( mc_amd )
-{
-xfree(mc_amd->equiv_cpu_table);
-xfree(mc_amd->mpb);
-xfree(mc_amd);
-}
-
-mc_amd = xmalloc(struct microcode_amd);
-uci->mc.mc_amd = mc_amd;
-if ( !mc_amd )
-return -ENOMEM;
-mc_amd->equiv_cpu_table = xmalloc_bytes(src->equiv_cpu_table_size);
-if ( !mc_amd->equiv_cpu_table )
-goto err1;
-mc_amd->mpb = xmalloc_bytes(src->mpb_size);
-if ( !mc_amd->mpb )
-goto err2;
-
-mc_amd->equiv_cpu_table_size = src->equiv_cpu_table_size;
-mc_amd->mpb_size = src->mpb_size;
-memcpy(mc_amd->mpb, src->mpb, src->mpb_size);
-memcpy(mc_amd->equiv_cpu_table, src->equiv_cpu_table,
-   src->equiv_cpu_table_size);
-}
-
-return 1;
-
-err2:
-xfree(mc_amd->equiv_cpu_table);
-err1:
-xfree(mc_amd);
-uci->mc.mc_amd = NULL;
-return -ENOMEM;
-}
-
 static int start_update(void)
 {
 #if CONFIG_HVM
@@ -719,7 +673,6 @@ static int start_update(void)
 }
 
 static const struct microcode_ops microcode_amd_ops = {
-.microcode_resume_match   = microcode_resume_match,
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index eefc2d2..97f759e 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -455,13 +455,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-return get_matching_microcode(mc, cpu);
-}
-
 static const struct microcode_ops microcode_intel_ops = {
-

[Xen-devel] [PATCH v10 01/16] microcode/intel: extend microcode_update_match()

2019-09-12 Thread Chao Gao

to a more generic function. So that it can be used alone to check
an update against the CPU signature and current update revision.

Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header. And
constifying the parameter of microcode_sanity_check() such that it
can be called by microcode_update_match().

Signed-off-by: Chao Gao 
---
Changes in v10:
 - Drop RBs
 - assert that microcode passed to microcode_update_match() would pass
 sanity check. Constify the parameter of microcode_sanity_check()

Changes in v9:
 - microcode_update_match() doesn't accept (sig, pf, rev) any longer.
 Hence, it won't be used to compare two arbitrary updates.
 - rewrite patch description

Changes in v8:
 - make sure enough room for an extended header and signature array

Changes in v6:
 - eliminate unnecessary type casting in microcode_update_match
 - check if a patch has an extend header

Changes in v5:
 - constify the extended_signature
 - use named enum type for the return value of microcode_update_match
---
 xen/arch/x86/microcode_intel.c  | 75 ++---
 xen/include/asm-x86/microcode.h |  6 
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 22fdeca..1a3ffa5 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -134,21 +134,11 @@ static int collect_cpu_info(unsigned int cpu_num, struct 
cpu_signature *csig)
 return 0;
 }
 
-static inline int microcode_update_match(
-unsigned int cpu_num, const struct microcode_header_intel *mc_header,
-int sig, int pf)
+static int microcode_sanity_check(const void *mc)
 {
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu_num);
-
-return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
-(mc_header->rev > uci->cpu_sig.rev));
-}
-
-static int microcode_sanity_check(void *mc)
-{
-struct microcode_header_intel *mc_header = mc;
-struct extended_sigtable *ext_header = NULL;
-struct extended_signature *ext_sig;
+const struct microcode_header_intel *mc_header = mc;
+const struct extended_sigtable *ext_header = NULL;
+const struct extended_signature *ext_sig;
 unsigned long total_size, data_size, ext_table_size;
 unsigned int ext_sigcount = 0, i;
 uint32_t sum, orig_sum;
@@ -234,6 +224,42 @@ static int microcode_sanity_check(void *mc)
 return 0;
 }
 
+/* Check an update against the CPU signature and current update revision */
+static enum microcode_match_result microcode_update_match(
+const struct microcode_header_intel *mc_header, unsigned int cpu)
+{
+const struct extended_sigtable *ext_header;
+const struct extended_signature *ext_sig;
+unsigned int i;
+struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+unsigned int sig = uci->cpu_sig.sig;
+unsigned int pf = uci->cpu_sig.pf;
+unsigned int rev = uci->cpu_sig.rev;
+unsigned long data_size = get_datasize(mc_header);
+const void *end = (const void *)mc_header + get_totalsize(mc_header);
+
+ASSERT(!microcode_sanity_check(mc_header));
+if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+ext_header = (const void *)(mc_header + 1) + data_size;
+ext_sig = (const void *)(ext_header + 1);
+
+/*
+ * Make sure there is enough space to hold an extended header and enough
+ * array elements.
+ */
+if ( (end < (const void *)ext_sig) ||
+ (end < (const void *)(ext_sig + ext_header->count)) )
+return MIS_UCODE;
+
+for ( i = 0; i < ext_header->count; i++ )
+if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+return MIS_UCODE;
+}
+
 /*
  * return 0 - no update found
  * return 1 - found update
@@ -243,31 +269,12 @@ static int get_matching_microcode(const void *mc, 
unsigned int cpu)
 {
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
 const struct microcode_header_intel *mc_header = mc;
-const struct extended_sigtable *ext_header;
 unsigned long total_size = get_totalsize(mc_header);
-int ext_sigcount, i;
-struct extended_signature *ext_sig;
 void *new_mc;
 
-if ( microcode_update_match(cpu, mc_header,
-mc_header->sig, mc_header->pf) )
-goto find;
-
-if ( total_size <= (get_datasize(mc_header) + MC_HEADER_SIZE) )
+if ( microcode_update_match(mc, cpu) != NEW_UCODE )
 return 0;
 
-ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-ext_sigcount = ext_header->count;
-ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-for ( i = 0; i < ext_sigcount; i++ )
-{
-if ( microcode_update_match(cpu, mc_header,
-

[Xen-devel] [PATCH v10 12/16] x86/microcode: Synchronize late microcode loading

2019-09-12 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v10:
 - introduce wait_for_state() and set_state() helper functions
 - make wait_for_condition() return bool and take const void *
 - disable/enable watchdog in control thread
 - rename "master" and "slave" thread to "primary" and "secondary"

Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
 ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
 there are two places for a cpu to call in. bitmap won't be counted
 twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
 update. We have to do this because some threads may be stuck in NMI
 handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 296 ++-
 1 file changed, 269 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index c2ea20f..049eda6 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+LOADING_PREPARE,
+LOADING_CALLIN,
+LOADING_ENTER,
+LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the highest revision number. Return error If no
  * patch is found and an error occurs during the parsing process. Otherwise
@@ -231,6 +275,34 @@ static bool microcode_update_cache(struct microcode_patch 
*patch)
 return true;
 }

[Xen-devel] [PATCH v10 06/16] microcode: remove pointless 'cpu' parameter

2019-09-12 Thread Chao Gao

Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - use a convenience variable 'cpu' in collect_cpu_info() on AMD side
 - rebase and fix conflicts

Changes in v8:
 - Use current_cpu_data in collect_cpu_info()
 - keep the cpu parameter of check_final_patch_levels()
 - use smp_processor_id() in get_matching_microcode() rather than
 define a local variable and label it "__maybe_unused"
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 20 
 xen/arch/x86/microcode_amd.c| 34 +-
 xen/arch/x86/microcode_intel.c  | 41 +++--
 xen/arch/x86/smpboot.c  |  2 +-
 xen/include/asm-x86/microcode.h |  7 +++
 xen/include/asm-x86/processor.h |  2 +-
 7 files changed, 42 insertions(+), 66 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index e3954ee..269b140 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -278,7 +278,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu(0);
+microcode_resume_cpu();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index d17dbec..89a8d2b 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -196,19 +196,19 @@ struct microcode_info {
 char buffer[1];
 };
 
-int microcode_resume_cpu(unsigned int cpu)
+int microcode_resume_cpu(void)
 {
 int err;
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode(cpu);
+err = microcode_ops->apply_microcode();
 spin_unlock(_mutex);
 
 return err;
@@ -257,9 +257,9 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(cpu, buf, size);
+err = microcode_ops->cpu_request_microcode(buf, size);
 spin_unlock(_mutex);
 
 return err;
@@ -348,8 +348,6 @@ __initcall(microcode_init);
 
 int __init early_microcode_update_cpu(bool start_update)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -368,7 +366,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( data )
 {
@@ -386,8 +384,6 @@ int __init early_microcode_update_cpu(bool start_update)
 
 int __init early_microcode_init(void)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -400,7 +396,7 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
 rc = early_microcode_update_cpu(true);
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 69c9cfe..1d27c71 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -78,8 +78,9 @@ struct mpbhdr {
 static DEFINE_SPINLOCK(microcode_update_lock);
 
 /* See comment in start_update() for cases when this routine fails */
-static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
+static int collect_cpu_info(struct cpu_signature *csig)
 {
+unsigned int cpu = smp_processor_id();
 struct cpuinfo_x86 *c = _data[cpu];
 
 memset(csig, 0, sizeof(*csig));
@@ -153,17 +154,15 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 }
 
 static enum microcode_match_result microcode_fits(
-const struct microcode_amd *mc_amd, unsigned int cpu)
+const struct microcode_amd *mc_amd)
 {
+unsigned int cpu = smp_processor_id();
 const struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
 const struct equiv_cpu_entry *equiv_cpu_table = mc_amd->equiv_cpu_table;
 unsigned int current_cpu_id;
 unsigned

Re: [Xen-devel] [ANNOUNCE] Xen 4.13 Development Update

2019-09-11 Thread Chao Gao

On Fri, Sep 06, 2019 at 09:40:58AM +0200, Juergen Gross wrote:
>This email only tracks big items for xen.git tree. Please reply for items you
>would like to see in 4.13 so that people have an idea what is going on and
>prioritise accordingly.
>
>=== x86 === 
>
>*  HVM guest CPU topology support (RFC)
>  -  Chao Gao

No plan to continue this one due to some reason. Please drop this one.

>
>*  Improve late microcode loading (v9)
>  -  Chao Gao
>

Working on the v10. I would like to get it merged in 4.13.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-09-08 Thread Chao Gao

On Fri, Aug 30, 2019 at 02:35:06PM +0800, Chao Gao wrote:
>On Thu, Aug 29, 2019 at 02:11:10PM +0200, Jan Beulich wrote:
>>On 27.08.2019 06:52, Chao Gao wrote:
>>> On Mon, Aug 26, 2019 at 04:07:59PM +0800, Chao Gao wrote:
>>>> On Fri, Aug 23, 2019 at 09:46:37AM +0100, Sergey Dyasli wrote:
>>>>> On 19/08/2019 02:25, Chao Gao wrote:
>>>>>> register an nmi callback. And this callback does busy-loop on threads
>>>>>> which are waiting for loading completion. Control threads send NMI to
>>>>>> slave threads to prevent NMI acceptance during ucode loading.
>>>>>>
>>>>>> Signed-off-by: Chao Gao 
>>>>>> ---
>>>>>> Changes in v9:
>>>>>>  - control threads send NMI to all other threads. Slave threads will
>>>>>>  stay in the NMI handling to prevent NMI acceptance during ucode
>>>>>>  loading. Note that self-nmi is invalid according to SDM.
>>>>>
>>>>> To me this looks like a half-measure: why keep only slave threads in
>>>>> the NMI handler, when master threads can update the microcode from
>>>>> inside the NMI handler as well?
>>>>
>>>> No special reason. Because the issue we want to address is that slave
>>>> threads might go to handle NMI and access MSRs when master thread is
>>>> loading ucode. So we only keep slave threads in the NMI handler.
>>>>
>>>>>
>>>>> You mention that self-nmi is invalid, but Xen has self_nmi() which is
>>>>> used for apply_alternatives() during boot, so can be trusted to work.
>>>>
>>>> Sorry, I meant using self shorthand to send self-nmi. I tried to use
>>>> self shorthand but got APIC error. And I agree that it is better to
>>>> make slave thread call self_nmi() itself.
>>>>
>>>>>
>>>>> I experimented a bit with the following approach: after loading_state
>>>>> becomes LOADING_CALLIN, each cpu issues a self_nmi() and rendezvous
>>>>> via cpu_callin_map into LOADING_ENTER to do a ucode update directly in
>>>>> the NMI handler. And it seems to work.
>>>>>
>>>>> Separate question is about the safety of this approach: can we be sure
>>>>> that a ucode update would not reset the status of the NMI latch? I.e.
>>>>> can it cause another NMI to be delivered while Xen already handles one?
>>>>
>>>> Ashok, what's your opinion on Sergey's approach and his concern?
>>> 
>>> I talked with Ashok. We think your approach is better. I will follow
>>> your approach in v10. It would be much helpful if you post your patch
>>> so that I can just rebase it onto other patches.
>>
>>Doing the actual ucode update inside an NMI handler seems rather risky
>>to me. Even if Ashok confirmed it would not be an issue on past and
>>current Intel CPUs - what about future ones, or ones from other vendors?
>

Intel SDM doesn't say that loading ucode isn't allowed inside an NMI
handler. So it is allowed implicitly. If future CPUs cannot load ucode
in NMI handler, SDM should document it and at that time, we can move
ucode loading out of NMI handler for new CPUS. As to AMD, if someone
objects to this approach, let's use this approach for Intel only.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch] xen/pt: Emulate FLR capability

2019-09-06 Thread Chao Gao

On Thu, Aug 29, 2019 at 12:21:11PM +0200, Roger Pau Monné wrote:
>On Thu, Aug 29, 2019 at 05:02:27PM +0800, Chao Gao wrote:
>> Currently, for a HVM on Xen, no reset method is virtualized. So in a VM's
>> perspective, assigned devices cannot be reset. But some devices rely on PCI
>> reset to recover from hardware hangs. When being assigned to a VM, those
>> devices cannot be reset and won't work any longer if a hardware hang occurs.
>> We have to reboot VM to trigger PCI reset on host to recover the device.
>>
>> This patch exposes FLR capability to VMs if the assigned device can be reset 
>> on
>> host. When VM initiates an FLR to a device, qemu cleans up the device state,
>> (including disabling of intx and/or MSI and unmapping BARs from guest, 
>> deleting
>> emulated registers), then initiate PCI reset through 'reset' knob under the
>> device's sysfs, finally initialize the device again.
>
>I think you likely need to deassign the device from the VM, perform
>the reset, and then assign the device again, so that there's no Xen
>internal state carried over prior to the reset?

Yes. It is the safest way. But here I want to present the feature as FLR
(such that the device driver in guest can issue PCI reset whenever
needed and no change is needed to device driver).  Current device
deassignment notifies guest that the device is going to be removed
It is not the standard PCI reset. Is it possible to make guest unaware
of the device deassignment to emulate a standard PCI reset? In my mind,
we can expose do_pci_remove/add to qemu or rewrite them in qemu (but
don't remove the device from guest's PCI hierarchy). Do you think it is
the right direction?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Xen-unstable] boot crash while loading AMD microcode due to commit "microcode/amd: fix memory leak"

2019-08-30 Thread Chao Gao

On Fri, Aug 30, 2019 at 09:49:04AM +0200, Jan Beulich wrote:
>On 30.08.2019 04:09, Chao Gao wrote:
>> On Fri, Aug 30, 2019 at 01:04:54AM +0200, Sander Eikelenboom wrote:
>>> L.S.,
>>>
>>> While testing xen-unstable, my AMD system crashes during early boot while 
>>> loading microcode with an "Early fatal page fault".
>>> Reverting commit de45e3ff37bb1602796054afabfa626ea5661c45 "microcode/amd: 
>>> fix memory leak" fixes the boot issue.
>> 
>> Sorry for this inconvenience.
>> 
>> Could you apply the patch attached and try it again?
>
>I'm inclined to take this fix even without waiting for Sander's
>feedback (and simply implying your S-o-b).

Ack.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-08-30 Thread Chao Gao

On Thu, Aug 29, 2019 at 02:11:10PM +0200, Jan Beulich wrote:
>On 27.08.2019 06:52, Chao Gao wrote:
>> On Mon, Aug 26, 2019 at 04:07:59PM +0800, Chao Gao wrote:
>>> On Fri, Aug 23, 2019 at 09:46:37AM +0100, Sergey Dyasli wrote:
>>>> On 19/08/2019 02:25, Chao Gao wrote:
>>>>> register an nmi callback. And this callback does busy-loop on threads
>>>>> which are waiting for loading completion. Control threads send NMI to
>>>>> slave threads to prevent NMI acceptance during ucode loading.
>>>>>
>>>>> Signed-off-by: Chao Gao 
>>>>> ---
>>>>> Changes in v9:
>>>>>  - control threads send NMI to all other threads. Slave threads will
>>>>>  stay in the NMI handling to prevent NMI acceptance during ucode
>>>>>  loading. Note that self-nmi is invalid according to SDM.
>>>>
>>>> To me this looks like a half-measure: why keep only slave threads in
>>>> the NMI handler, when master threads can update the microcode from
>>>> inside the NMI handler as well?
>>>
>>> No special reason. Because the issue we want to address is that slave
>>> threads might go to handle NMI and access MSRs when master thread is
>>> loading ucode. So we only keep slave threads in the NMI handler.
>>>
>>>>
>>>> You mention that self-nmi is invalid, but Xen has self_nmi() which is
>>>> used for apply_alternatives() during boot, so can be trusted to work.
>>>
>>> Sorry, I meant using self shorthand to send self-nmi. I tried to use
>>> self shorthand but got APIC error. And I agree that it is better to
>>> make slave thread call self_nmi() itself.
>>>
>>>>
>>>> I experimented a bit with the following approach: after loading_state
>>>> becomes LOADING_CALLIN, each cpu issues a self_nmi() and rendezvous
>>>> via cpu_callin_map into LOADING_ENTER to do a ucode update directly in
>>>> the NMI handler. And it seems to work.
>>>>
>>>> Separate question is about the safety of this approach: can we be sure
>>>> that a ucode update would not reset the status of the NMI latch? I.e.
>>>> can it cause another NMI to be delivered while Xen already handles one?
>>>
>>> Ashok, what's your opinion on Sergey's approach and his concern?
>> 
>> I talked with Ashok. We think your approach is better. I will follow
>> your approach in v10. It would be much helpful if you post your patch
>> so that I can just rebase it onto other patches.
>
>Doing the actual ucode update inside an NMI handler seems rather risky
>to me. Even if Ashok confirmed it would not be an issue on past and
>current Intel CPUs - what about future ones, or ones from other vendors?

Will confirm these with Ashok.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-08-30 Thread Chao Gao

On Thu, Aug 29, 2019 at 02:22:47PM +0200, Jan Beulich wrote:
>On 19.08.2019 03:25, Chao Gao wrote:
>> @@ -481,12 +478,28 @@ static int do_microcode_update(void *patch)
>>  return ret;
>>  }
>>  
>> +static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>> +{
>> +/* The first thread of a core is to load an update. Don't block it. */
>> +if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ||
>> + loading_state != LOADING_CALLIN )
>> +return 0;
>> +
>> +cpumask_set_cpu(cpu, _callin_map);
>> +
>> +while ( loading_state != LOADING_EXIT )
>> +cpu_relax();
>> +
>> +return 0;
>> +}
>
>By returning 0 you tell do_nmi() to continue processing the NMI.
>Since you can't tell whether a non-IPI NMI has surfaced at about
>the same time this is generally the right thing imo, but how do
>you prevent unknown_nmi_error() from getting entered when do_nmi()
>ends up setting handle_unknown to true? (The question is mostly
>rhetorical, but there's a disconnect between do_nmi() checking
>"cpu == 0" and the control thread running on
>cpumask_first(_online_map), i.e. you introduce a well hidden
>dependency on CPU 0 never going offline. IOW my request is to at
>least make this less well hidden, such that it can be noticed if
>and when someone endeavors to remove said limitation.)

Seems the issue is that we couldn't send IPI NMI to BSP, otherwise
unknown_nmi_error() would be trigger. And loading ucode after
rendezvousing all CPUs in NMI handler expects all CPUs to receive IPI
NMI. So this approach always has such issue.

Considering self_nmi is called at another place, could we provide a
way to temporarily suppress or (force) ignore unknown nmi error?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 13/15] x86/microcode: Synchronize late microcode loading

2019-08-29 Thread Chao Gao

On Thu, Aug 29, 2019 at 02:06:39PM +0200, Jan Beulich wrote:
>On 19.08.2019 03:25, Chao Gao wrote:
>> +
>> +static int master_thread_fn(const struct microcode_patch *patch)
>> +{
>> +unsigned int cpu = smp_processor_id();
>> +int ret = 0;
>> +
>> +while ( loading_state != LOADING_CALLIN )
>> +cpu_relax();
>> +
>> +cpumask_set_cpu(cpu, _callin_map);
>> +
>> +while ( loading_state != LOADING_ENTER )
>> +cpu_relax();
>> +
>> +/*
>> + * If an error happened, control thread would set 'loading_state'
>> + * to LOADING_EXIT. Don't perform ucode loading for this case
>> + */
>> +if ( loading_state == LOADING_EXIT )
>> +return ret;
>
>Even if the producer transitions this through ENTER to EXIT, the
>observer here may never get to see the ENTER state, and hence
>never exit the loop above. You want either < ENTER or == CALLIN.

Yes. I find stopmachine_action() is a good example to implement
a state machine. I will follow it.

>
>> +ret = microcode_ops->apply_microcode(patch);
>> +if ( !ret )
>> +atomic_inc(_updated);
>> +atomic_inc(_out);
>> +
>> +while ( loading_state != LOADING_EXIT )
>> +cpu_relax();
>> +
>> +return ret;
>> +}
>
>As a cosmetic remark, I don't think "master" and "slave" are
>suitable terms here. "primary" and "secondary" would imo come
>closer to what the threads' relationship is.

Will do.

>> +
>> +/*
>> + * We intend to disable interrupt for long time, which may lead to
>> + * watchdog timeout.
>> + */
>> +watchdog_disable();
>> +/*
>> + * Late loading dance. Why the heavy-handed stop_machine effort?
>> + *
>> + * - HT siblings must be idle and not execute other code while the other
>> + *   sibling is loading microcode in order to avoid any negative
>> + *   interactions cause by the loading.
>> + *
>> + * - In addition, microcode update on the cores must be serialized until
>> + *   this requirement can be relaxed in the future. Right now, this is
>> + *   conservative and good.
>> + */
>> +ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
>> +watchdog_enable();
>
>Considering that stop_machine_run() doesn't itself disable the watchdog,
>did you consider having the control thread disable/enable the watchdog,
>thus shortening the period where it's not active?

Good idea. It helps to keep the code here clean. I think maybe
microcode_nmi_callback can be registered by control thread as well.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 10/15] microcode: split out apply_microcode() from cpu_request_microcode()

2019-08-29 Thread Chao Gao

On Thu, Aug 29, 2019 at 12:06:28PM +0200, Jan Beulich wrote:
>On 22.08.2019 15:59, Roger Pau Monné  wrote:
>> Seeing how this works I'm not sure what's the best option here. As
>> updating will be attempted on other CPUs, I'm not sure if it's OK to
>> return an error if the update succeed on some CPUs but failed on
>> others.
>
>The overall result of a partially successful update should be an
>error - mismatched ucode may, after all, be more of a problem
>than outdated ucode.

Will only take care -EIO case. If systems have differing ucodes on
cores, partially update is expected when we try to correct the system
with an ucode equal to the newest ucode rev already on cores.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Xen-unstable] boot crash while loading AMD microcode due to commit "microcode/amd: fix memory leak"

2019-08-29 Thread Chao Gao

On Fri, Aug 30, 2019 at 01:04:54AM +0200, Sander Eikelenboom wrote:
>L.S.,
>
>While testing xen-unstable, my AMD system crashes during early boot while 
>loading microcode with an "Early fatal page fault".
>Reverting commit de45e3ff37bb1602796054afabfa626ea5661c45 "microcode/amd: fix 
>memory leak" fixes the boot issue.

Sorry for this inconvenience.

Could you apply the patch attached and try it again?

>
>At present I don't have my serial console stuff at hand, but if needed I can 
>send the stacktrace tomorrow.
>

Yes. That would be helpful.

Thanks
Chao
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 3069784..9b74330 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -552,9 +552,12 @@ static int cpu_request_microcode(unsigned int cpu, const void *buf,
 mc_old = mc_amd;
 }
 
-xfree(mc_old->mpb);
-xfree(mc_old->equiv_cpu_table);
-xfree(mc_old);
+if ( mc_old )
+{
+xfree(mc_old->mpb);
+xfree(mc_old->equiv_cpu_table);
+xfree(mc_old);
+}
 
   out:
 #if CONFIG_HVM
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch] xen/pt: Emulate FLR capability

2019-08-29 Thread Chao Gao

On Thu, Aug 29, 2019 at 12:03:44PM +0200, Jan Beulich wrote:
>On 29.08.2019 11:02, Chao Gao wrote:
>> Currently, for a HVM on Xen, no reset method is virtualized. So in a VM's
>> perspective, assigned devices cannot be reset. But some devices rely on PCI
>> reset to recover from hardware hangs. When being assigned to a VM, those
>> devices cannot be reset and won't work any longer if a hardware hang occurs.
>> We have to reboot VM to trigger PCI reset on host to recover the device.
>
>Did you consider a hot-unplug, reset (by host), hot-plug cycle instead?

Yes. I considered this means. But it needs host to initiate this action.
However, when a device needs reset is determined by the device driver
in VM. So in practice, VM still needs a way to notify host to do
unplug/reset/plug. As the standard FLR capability can meet the
requirement, I don't try to invent one.

>
>> +static int xen_pt_devctl_reg_write(XenPCIPassthroughState *s,
>> +   XenPTReg *cfg_entry, uint16_t *val,
>> +   uint16_t dev_value, uint16_t valid_mask)
>> +{
>> +if (s->real_device.is_resetable && (*val & PCI_EXP_DEVCTL_BCR_FLR)) {
>> +xen_pt_reset(s);
>> +}
>> +return xen_pt_word_reg_write(s, cfg_entry, val, dev_value, valid_mask);
>
>I think you also need to clear the bit before handing on the request,
>such that reads will always observe it clear.

Will do.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC Patch] xen/pt: Emulate FLR capability

2019-08-29 Thread Chao Gao

Currently, for a HVM on Xen, no reset method is virtualized. So in a VM's
perspective, assigned devices cannot be reset. But some devices rely on PCI
reset to recover from hardware hangs. When being assigned to a VM, those
devices cannot be reset and won't work any longer if a hardware hang occurs.
We have to reboot VM to trigger PCI reset on host to recover the device.

This patch exposes FLR capability to VMs if the assigned device can be reset on
host. When VM initiates an FLR to a device, qemu cleans up the device state,
(including disabling of intx and/or MSI and unmapping BARs from guest, deleting
emulated registers), then initiate PCI reset through 'reset' knob under the
device's sysfs, finally initialize the device again.

Signed-off-by: Chao Gao 
---
Do we need to introduce an attribute, like "permissive" to explicitly
enable FLR capability emulation? During PCI reset, interrupts and BARs are
unmapped from the guest. It seems that guest cannot interact with the device
directly except access to device's configuration space which is emulated by
qemu. If proper method can be used to prevent qemu accessing the physical
device there is no new security hole caused by the FLR emulation.

VM's FLR may be backed by any reset function on host to the physical device,
for example: FLR, D3softreset, secondary bus reset. Not sure it is fine to mix
them. Given Linux kernel just uses an unified API to reset device and caller
cannot choose a specific one, it might be OK.
---
 hw/xen/xen-host-pci-device.c | 30 ++
 hw/xen/xen-host-pci-device.h |  3 +++
 hw/xen/xen_pt.c  |  9 +
 hw/xen/xen_pt.h  |  1 +
 hw/xen/xen_pt_config_init.c  | 30 +++---
 5 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 1b44dcafaf..d549656f42 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -198,6 +198,35 @@ static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
 return !stat(path, );
 }
 
+static bool xen_host_pci_resetable(XenHostPCIDevice *d)
+{
+char path[PATH_MAX];
+
+xen_host_pci_sysfs_path(d, "reset", path, sizeof(path));
+
+return !access(path, W_OK);
+}
+
+void xen_host_pci_reset(XenHostPCIDevice *d)
+{
+char path[PATH_MAX];
+int fd;
+
+xen_host_pci_sysfs_path(d, "reset", path, sizeof(path));
+
+fd = open(path, O_WRONLY);
+if (fd == -1) {
+XEN_HOST_PCI_LOG("Xen host pci reset: open error\n");
+return;
+}
+
+if (write(fd, "1", 1) != 1) {
+XEN_HOST_PCI_LOG("Xen host pci reset: write error\n");
+}
+
+return;
+}
+
 static void xen_host_pci_config_open(XenHostPCIDevice *d, Error **errp)
 {
 char path[PATH_MAX];
@@ -377,6 +406,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 d->class_code = v;
 
 d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+d->is_resetable = xen_host_pci_resetable(d);
 
 return;
 
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb0..cacf9b3df8 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -32,6 +32,7 @@ typedef struct XenHostPCIDevice {
 XenHostPCIIORegion rom;
 
 bool is_virtfn;
+bool is_resetable;
 
 int config_fd;
 } XenHostPCIDevice;
@@ -55,4 +56,6 @@ int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, 
uint8_t *buf,
 
 int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
 
+void xen_host_pci_reset(XenHostPCIDevice *d);
+
 #endif /* XEN_HOST_PCI_DEVICE_H */
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 8fbaf2eae9..d750367c0a 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -938,6 +938,15 @@ static void xen_pt_unregister_device(PCIDevice *d)
 xen_pt_destroy(d);
 }
 
+void xen_pt_reset(XenPCIPassthroughState *s)
+{
+PCIDevice *d = PCI_DEVICE(s);
+
+xen_pt_unregister_device(d);
+xen_host_pci_reset(>real_device);
+xen_pt_realize(d, NULL);
+}
+
 static Property xen_pci_passthrough_properties[] = {
 DEFINE_PROP_PCI_HOST_DEVADDR("hostaddr", XenPCIPassthroughState, hostaddr),
 DEFINE_PROP_BOOL("permissive", XenPCIPassthroughState, permissive, false),
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 9167bbaf6d..ed05bc0d39 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -332,4 +332,5 @@ int xen_pt_register_vga_regions(XenHostPCIDevice *dev);
 int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev);
 void xen_pt_setup_vga(XenPCIPassthroughState *s, XenHostPCIDevice *dev,
  Error **errp);
+void xen_pt_reset(XenPCIPassthroughState *s);
 #endif /* XEN_PT_H */
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 31ec5add1d..435abd7286 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -852,6 +8

Re: [Xen-devel] [PATCH v9 11/15] microcode: unify loading update during CPU resuming and AP wakeup

2019-08-29 Thread Chao Gao

On Fri, Aug 23, 2019 at 11:09:07AM +0200, Roger Pau Monné wrote:
>On Fri, Aug 23, 2019 at 12:44:34AM +0800, Chao Gao wrote:
>> On Thu, Aug 22, 2019 at 04:10:46PM +0200, Roger Pau Monné wrote:
>> >On Mon, Aug 19, 2019 at 09:25:24AM +0800, Chao Gao wrote:
>> >> Both are loading the cached patch. Since APs call the unified function,
>> >> microcode_update_one(), during wakeup, the 'start_update' parameter
>> >> which originally used to distinguish BSP and APs is redundant. So remove
>> >> this parameter.
>> >> 
>> >> Signed-off-by: Chao Gao 
>> >> ---
>> >> Note that here is a functional change: resuming a CPU would call
>> >> ->end_update() now while previously it wasn't. Not quite sure
>> >> whether it is correct.
>> >
>> >I guess that's required if it called start_update prior to calling
>> >end_update?
>> >
>> >> 
>> >> Changes in v9:
>> >>  - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
>> >>microcode_update_one()
>> >>  - rebase and fix conflicts.
>> >> 
>> >> Changes in v8:
>> >>  - split out from the previous patch
>> >> ---
>> >>  xen/arch/x86/acpi/power.c   |  2 +-
>> >>  xen/arch/x86/microcode.c| 90 
>> >> ++---
>> >>  xen/arch/x86/smpboot.c  |  5 +--
>> >>  xen/include/asm-x86/processor.h |  4 +-
>> >>  4 files changed, 44 insertions(+), 57 deletions(-)
>> >> 
>> >> diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
>> >> index 4f21903..24798d5 100644
>> >> --- a/xen/arch/x86/acpi/power.c
>> >> +++ b/xen/arch/x86/acpi/power.c
>> >> @@ -253,7 +253,7 @@ static int enter_state(u32 state)
>> >>  
>> >>  console_end_sync();
>> >>  
>> >> -microcode_resume_cpu();
>> >> +microcode_update_one();
>> >>  
>> >>  if ( !recheck_cpu_features(0) )
>> >>  panic("Missing previously available feature(s)\n");
>> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> >> index a2febc7..bdd9c9f 100644
>> >> --- a/xen/arch/x86/microcode.c
>> >> +++ b/xen/arch/x86/microcode.c
>> >> @@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char 
>> >> *buf, uint32_t len)
>> >>  return NULL;
>> >>  }
>> >>  
>> >> -int microcode_resume_cpu(void)
>> >> -{
>> >> -int err;
>> >> -struct cpu_signature *sig = _cpu(cpu_sig);
>> >> -
>> >> -if ( !microcode_ops )
>> >> -return 0;
>> >> -
>> >> -spin_lock(_mutex);
>> >> -
>> >> -err = microcode_ops->collect_cpu_info(sig);
>> >> -if ( likely(!err) )
>> >> -err = microcode_ops->apply_microcode(microcode_cache);
>> >> -spin_unlock(_mutex);
>> >> -
>> >> -return err;
>> >> -}
>> >> -
>> >>  void microcode_free_patch(struct microcode_patch *microcode_patch)
>> >>  {
>> >>  microcode_ops->free_patch(microcode_patch->mc);
>> >> @@ -384,11 +366,29 @@ static int __init microcode_init(void)
>> >>  }
>> >>  __initcall(microcode_init);
>> >>  
>> >> -int __init early_microcode_update_cpu(bool start_update)
>> >> +/* Load a cached update to current cpu */
>> >> +int microcode_update_one(void)
>> >> +{
>> >> +int rc;
>> >> +
>> >> +if ( !microcode_ops )
>> >> +return -EOPNOTSUPP;
>> >> +
>> >> +rc = microcode_update_cpu(NULL);
>> >> +
>> >> +if ( microcode_ops->end_update )
>> >> +microcode_ops->end_update();
>> >
>> >Don't you need to call start_update before calling
>> >microcode_update_cpu?
>> 
>> No. On AMD side, osvw_status records the hardware erratum in the system.
>> As we don't assume all CPUs have the same erratum, each cpu calls
>> end_update to update osvw_status after ucode loading.
>> start_update just resets osvw_status to 0. And it is called once prior
>> to ucode loading on any CPU so that osvw_status can be recomputed.
>
>Oh, I think I understand it. start_update must only be

Re: [Xen-devel] [PATCH v9 01/15] microcode/intel: extend microcode_update_match()

2019-08-29 Thread Chao Gao

On Wed, Aug 28, 2019 at 05:12:34PM +0200, Jan Beulich wrote:
>On 19.08.2019 03:25, Chao Gao wrote:
>> to a more generic function. So that it can be used alone to check
>> an update against the CPU signature and current update revision.
>> 
>> Note that enum microcode_match_result will be used in common code
>> (aka microcode.c), it has been placed in the common header.
>> 
>> Signed-off-by: Chao Gao 
>> Reviewed-by: Roger Pau Monné 
>> Reviewed-by: Jan Beulich 
>
>I don't think these can be legitimately retained with ...
>
>> Changes in v9:
>>  - microcode_update_match() doesn't accept (sig, pf, rev) any longer.
>>  Hence, it won't be used to compare two arbitrary updates.
>
>... this kind of a change.

Will drop RBs.

>
>> --- a/xen/arch/x86/microcode_intel.c
>> +++ b/xen/arch/x86/microcode_intel.c
>> @@ -134,14 +134,39 @@ static int collect_cpu_info(unsigned int cpu_num, 
>> struct cpu_signature *csig)
>>  return 0;
>>  }
>>  
>> -static inline int microcode_update_match(
>> -unsigned int cpu_num, const struct microcode_header_intel *mc_header,
>> -int sig, int pf)
>> +/* Check an update against the CPU signature and current update revision */
>> +static enum microcode_match_result microcode_update_match(
>> +const struct microcode_header_intel *mc_header, unsigned int cpu)
>>  {
>> -struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu_num);
>> -
>> -return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
>> -(mc_header->rev > uci->cpu_sig.rev));
>> +const struct extended_sigtable *ext_header;
>> +const struct extended_signature *ext_sig;
>> +unsigned int i;
>> +struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
>> +unsigned int sig = uci->cpu_sig.sig;
>> +unsigned int pf = uci->cpu_sig.pf;
>> +unsigned int rev = uci->cpu_sig.rev;
>> +unsigned long data_size = get_datasize(mc_header);
>> +const void *end = (const void *)mc_header + get_totalsize(mc_header);
>> +
>> +if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
>> +return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
>
>Didn't you lose a range check against "end" ahead of this if()?
>get_totalsize() and get_datasize() aiui also would need to live
>after a range check, just a sizeof() (i.e. MC_HEADER_SIZE) based
>one. This would also affect the caller as it seems.

I think microcode_sanity_check() is for this purpose. We can do
sanity check before the if(). Perhaps, we can just add an assertion
that sanity check won't fail. Because whenever sanity check failed
when pasing an ucode blob, we just drop the ucode; we won't pass an
broken ucode to this function.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-08-26 Thread Chao Gao

On Mon, Aug 26, 2019 at 04:07:59PM +0800, Chao Gao wrote:
>On Fri, Aug 23, 2019 at 09:46:37AM +0100, Sergey Dyasli wrote:
>>On 19/08/2019 02:25, Chao Gao wrote:
>>> register an nmi callback. And this callback does busy-loop on threads
>>> which are waiting for loading completion. Control threads send NMI to
>>> slave threads to prevent NMI acceptance during ucode loading.
>>> 
>>> Signed-off-by: Chao Gao 
>>> ---
>>> Changes in v9:
>>>  - control threads send NMI to all other threads. Slave threads will
>>>  stay in the NMI handling to prevent NMI acceptance during ucode
>>>  loading. Note that self-nmi is invalid according to SDM.
>>
>>To me this looks like a half-measure: why keep only slave threads in
>>the NMI handler, when master threads can update the microcode from
>>inside the NMI handler as well?
>
>No special reason. Because the issue we want to address is that slave
>threads might go to handle NMI and access MSRs when master thread is
>loading ucode. So we only keep slave threads in the NMI handler.
>
>>
>>You mention that self-nmi is invalid, but Xen has self_nmi() which is
>>used for apply_alternatives() during boot, so can be trusted to work.
>
>Sorry, I meant using self shorthand to send self-nmi. I tried to use
>self shorthand but got APIC error. And I agree that it is better to
>make slave thread call self_nmi() itself.
>
>>
>>I experimented a bit with the following approach: after loading_state
>>becomes LOADING_CALLIN, each cpu issues a self_nmi() and rendezvous
>>via cpu_callin_map into LOADING_ENTER to do a ucode update directly in
>>the NMI handler. And it seems to work.
>>
>>Separate question is about the safety of this approach: can we be sure
>>that a ucode update would not reset the status of the NMI latch? I.e.
>>can it cause another NMI to be delivered while Xen already handles one?
>
>Ashok, what's your opinion on Sergey's approach and his concern?

Hi Sergey,

I talked with Ashok. We think your approach is better. I will follow
your approach in v10. It would be much helpful if you post your patch
so that I can just rebase it onto other patches.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Reset pass-thru devices in a VM

2019-08-26 Thread Chao Gao

On Tue, Aug 27, 2019 at 12:17:28AM +0300, Pasi Kärkkäinen wrote:
>Hi Chao,
>
>On Fri, Aug 09, 2019 at 04:38:33PM +0800, Chao Gao wrote:
>> Hi everyone,
>> 
>> I have a device which only supports secondary bus reset. After being
>> assigned to a VM, it would be placed under host bridge. For devices
>> under host bridge, secondary bus reset is not applicable. Thus, a VM
>> has no way to reset this device.
>> 
>> This device's usage would be limited without PCI reset (for example, its
>> driver cannot re-initialize the device properly without PCI reset, which
>> means in VM device won't be usable after unloading the driver), it would
>> be much better if there is a way available to VMs to reset the device.
>> 
>> In my mind, a straightfoward solution is to create a virtual bridge
>> for a VM and place the pass-thru device under a virtual bridge. But it
>> isn't supported in Xen (KVM/QEMU supports) and enabling it looks need
>> a lot of efforts. Alternatively, emulating FLR (Function Level Reset)
>> capability for this device might be a feasible way and only needs
>> relatively few changes. I am planning to enable an opt-in feature
>> (like 'permissive') to allow qemu to expose FLR capability to guest for
>> pass-thru devices as long as this device is resetable on dom0 (i.e. the
>> device has 'reset' attribute under its sysfs). And when guest initiates
>> an FLR, qemu just echo 1 to the 'reset' attribute on dom0.
>> 
>> Do you think emulating FLR capability is doable?
>> 
>
>I wonder if these patches from another thread help with your reset issue:
>
>https://lists.xen.org/archives/html/xen-devel/2019-08/msg02304.html

Thanks for your attention.

The link you provides seems about how host resets a device. Emulating FLR
capability is to expose FLR capability to guest such that guest can
reset assigned devices. Definitely, qemu would intercept guest's
initiating an FLR and redirect it into a device reset on host.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-08-26 Thread Chao Gao

On Fri, Aug 23, 2019 at 09:46:37AM +0100, Sergey Dyasli wrote:
>On 19/08/2019 02:25, Chao Gao wrote:
>> register an nmi callback. And this callback does busy-loop on threads
>> which are waiting for loading completion. Control threads send NMI to
>> slave threads to prevent NMI acceptance during ucode loading.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> Changes in v9:
>>  - control threads send NMI to all other threads. Slave threads will
>>  stay in the NMI handling to prevent NMI acceptance during ucode
>>  loading. Note that self-nmi is invalid according to SDM.
>
>To me this looks like a half-measure: why keep only slave threads in
>the NMI handler, when master threads can update the microcode from
>inside the NMI handler as well?

No special reason. Because the issue we want to address is that slave
threads might go to handle NMI and access MSRs when master thread is
loading ucode. So we only keep slave threads in the NMI handler.

>
>You mention that self-nmi is invalid, but Xen has self_nmi() which is
>used for apply_alternatives() during boot, so can be trusted to work.

Sorry, I meant using self shorthand to send self-nmi. I tried to use
self shorthand but got APIC error. And I agree that it is better to
make slave thread call self_nmi() itself.

>
>I experimented a bit with the following approach: after loading_state
>becomes LOADING_CALLIN, each cpu issues a self_nmi() and rendezvous
>via cpu_callin_map into LOADING_ENTER to do a ucode update directly in
>the NMI handler. And it seems to work.
>
>Separate question is about the safety of this approach: can we be sure
>that a ucode update would not reset the status of the NMI latch? I.e.
>can it cause another NMI to be delivered while Xen already handles one?

Ashok, what's your opinion on Sergey's approach and his concern?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 12/15] microcode: reduce memory allocation and copy when creating a patch

2019-08-26 Thread Chao Gao

On Fri, Aug 23, 2019 at 10:11:21AM +0200, Roger Pau Monné wrote:
>On Mon, Aug 19, 2019 at 09:25:25AM +0800, Chao Gao wrote:
>> To create a microcode patch from a vendor-specific update,
>> allocate_microcode_patch() copied everything from the update.
>> It is not efficient. Essentially, we just need to go through
>> ucodes in the blob, find the one with the newest revision and
>> install it into the microcode_patch. In the process, buffers
>> like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
>> side) can be reused. microcode_patch now is allocated after
>> it is sure that there is a matching ucode.
>
>Oh, I think this answers my question on a previous patch.
>
>For future series it would be nice to avoid so many rewrites in the
>same series, alloc_microcode_patch is already modified in a previous
>patch, just to be removed here. It also makes it harder to follow
>what's going on.

Got it. This patch is added in this new version. And some trivial
patches already got reviewed-by. So I don't merge it with them.

>>  while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
>> )) == 0 )
>>  {
>> -struct microcode_patch *new_patch = alloc_microcode_patch(mc_amd);
>> -
>> -if ( IS_ERR(new_patch) )
>> -{
>> -error = PTR_ERR(new_patch);
>> -break;
>> -}
>> -
>>  /*
>> - * If the new patch covers current CPU, compare patches and store 
>> the
>> + * If the new ucode covers current CPU, compare ucodes and store the
>>   * one with higher revision.
>>   */
>> -if ( (microcode_fits(new_patch->mc_amd) != MIS_UCODE) &&
>> - (!patch || (compare_patch(new_patch, patch) == NEW_UCODE)) )
>> +#define REV_ID(mpb) (((struct microcode_header_amd 
>> *)(mpb))->processor_rev_id)
>> +if ( (microcode_fits(mc_amd) != MIS_UCODE) &&
>> + (!saved || (REV_ID(mc_amd->mpb) > REV_ID(saved))) )
>> +#undef REV_ID
>>  {
>> -struct microcode_patch *tmp = patch;
>> -
>> -patch = new_patch;
>> -new_patch = tmp;
>> +xfree(saved);
>> +saved = mc_amd->mpb;
>> +saved_size = mc_amd->mpb_size;
>>  }
>> -
>> -if ( new_patch )
>> -microcode_free_patch(new_patch);
>> +else
>> +xfree(mc_amd->mpb);

It might be better to move 'mc_amd->mpb = NULL' here.

>>  
>>  if ( offset >= bufsize )
>>  break;
>> @@ -593,9 +548,25 @@ static struct microcode_patch 
>> *cpu_request_microcode(const void *buf,
>>   *(const uint32_t *)(buf + offset) == UCODE_MAGIC )
>>  break;
>>  }
>> -xfree(mc_amd->mpb);
>> -xfree(mc_amd->equiv_cpu_table);
>> -xfree(mc_amd);
>> +
>> +if ( saved )
>> +{
>> +mc_amd->mpb = saved;
>> +mc_amd->mpb_size = saved_size;
>> +patch = xmalloc(struct microcode_patch);
>> +if ( patch )
>> +patch->mc_amd = mc_amd;
>> +else
>> +{
>> +free_patch(mc_amd);
>> +error = -ENOMEM;
>> +}
>> +}
>> +else
>> +{
>> +mc_amd->mpb = NULL;
>
>What's the point in setting mpb to NULL if you are just going to free
>mc_amd below?

To avoid double free here. mc_amd->mpb is always freed or saved.
And here we want to free mc_amd itself and mc_amd->equiv_cpu_table.

>
>Also, I'm not sure I understand why you need to free mc_amd, isn't
>this buff memory that should be freed by the caller?

But mc_amd is allocated in this function.

>
>ie: in the Intel counterpart below you don't seem to free the mc
>cursor used for the get_next_ucode_from_buffer loop.

'mc' is saved if it is newer than current patch stored in 'saved'.
Otherwise 'mc' is freed immediately. So we don't need to free it
again after the while loop.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 11/15] microcode: unify loading update during CPU resuming and AP wakeup

2019-08-22 Thread Chao Gao

On Thu, Aug 22, 2019 at 04:10:46PM +0200, Roger Pau Monné wrote:
>On Mon, Aug 19, 2019 at 09:25:24AM +0800, Chao Gao wrote:
>> Both are loading the cached patch. Since APs call the unified function,
>> microcode_update_one(), during wakeup, the 'start_update' parameter
>> which originally used to distinguish BSP and APs is redundant. So remove
>> this parameter.
>> 
>> Signed-off-by: Chao Gao 
>> ---
>> Note that here is a functional change: resuming a CPU would call
>> ->end_update() now while previously it wasn't. Not quite sure
>> whether it is correct.
>
>I guess that's required if it called start_update prior to calling
>end_update?
>
>> 
>> Changes in v9:
>>  - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
>>microcode_update_one()
>>  - rebase and fix conflicts.
>> 
>> Changes in v8:
>>  - split out from the previous patch
>> ---
>>  xen/arch/x86/acpi/power.c   |  2 +-
>>  xen/arch/x86/microcode.c| 90 
>> ++---
>>  xen/arch/x86/smpboot.c  |  5 +--
>>  xen/include/asm-x86/processor.h |  4 +-
>>  4 files changed, 44 insertions(+), 57 deletions(-)
>> 
>> diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
>> index 4f21903..24798d5 100644
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -253,7 +253,7 @@ static int enter_state(u32 state)
>>  
>>  console_end_sync();
>>  
>> -microcode_resume_cpu();
>> +microcode_update_one();
>>  
>>  if ( !recheck_cpu_features(0) )
>>  panic("Missing previously available feature(s)\n");
>> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> index a2febc7..bdd9c9f 100644
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char 
>> *buf, uint32_t len)
>>  return NULL;
>>  }
>>  
>> -int microcode_resume_cpu(void)
>> -{
>> -int err;
>> -struct cpu_signature *sig = _cpu(cpu_sig);
>> -
>> -if ( !microcode_ops )
>> -return 0;
>> -
>> -spin_lock(_mutex);
>> -
>> -err = microcode_ops->collect_cpu_info(sig);
>> -if ( likely(!err) )
>> -err = microcode_ops->apply_microcode(microcode_cache);
>> -spin_unlock(_mutex);
>> -
>> -return err;
>> -}
>> -
>>  void microcode_free_patch(struct microcode_patch *microcode_patch)
>>  {
>>  microcode_ops->free_patch(microcode_patch->mc);
>> @@ -384,11 +366,29 @@ static int __init microcode_init(void)
>>  }
>>  __initcall(microcode_init);
>>  
>> -int __init early_microcode_update_cpu(bool start_update)
>> +/* Load a cached update to current cpu */
>> +int microcode_update_one(void)
>> +{
>> +int rc;
>> +
>> +if ( !microcode_ops )
>> +return -EOPNOTSUPP;
>> +
>> +rc = microcode_update_cpu(NULL);
>> +
>> +if ( microcode_ops->end_update )
>> +microcode_ops->end_update();
>
>Don't you need to call start_update before calling
>microcode_update_cpu?

No. On AMD side, osvw_status records the hardware erratum in the system.
As we don't assume all CPUs have the same erratum, each cpu calls
end_update to update osvw_status after ucode loading.
start_update just resets osvw_status to 0. And it is called once prior
to ucode loading on any CPU so that osvw_status can be recomputed.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 00/15] improve late microcode loading

2019-08-22 Thread Chao Gao

On Thu, Aug 22, 2019 at 08:51:43AM +0100, Sergey Dyasli wrote:
>Hi Chao,
>
>On 19/08/2019 02:25, Chao Gao wrote:
>> Previous change log:
>> Changes in version 8:
>>  - block #NMI handling during microcode loading (Patch 16)
>>  - Don't assume that all CPUs in the system have loaded a same ucode.
>>  So when parsing a blob, we attempt to save a patch as long as it matches
>>  with current cpu signature regardless of the revision of the patch.
>>  And also for loading, we only require the patch to be loaded isn't old
>>  than the cached one.
>>  - store an update after the first successful loading on a CPU
>>  - remove the patch that calls wbinvd() unconditionally before microcode>  
>> loading. It is under internal discussion.
>
>I noticed that you removed the patch which adds wbinvd() back in v8.
>What was the reasoning behind that and is there any outcome from the
>internal discussion that you mention here?

Jan (maybe someone else) was concerned about the impact of calling
wbinvd() unconditionally, especially with your work to make serial
ucode loading an option. To address this concern, I planned to call
wbinvd() conditionally. I need to confirm with Intel microcode team
whether it is fine and what the condition should be. But I haven't
received an answer. I will talk with Ashok again and probably add
this patch back in v10.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 13/15] x86/microcode: Synchronize late microcode loading

2019-08-19 Thread Chao Gao

On Mon, Aug 19, 2019 at 11:27:36AM +0100, Sergey Dyasli wrote:
>> +static int master_thread_fn(const struct microcode_patch *patch)
>> +{
>> +unsigned int cpu = smp_processor_id();
>> +int ret = 0;
>> +
>> +while ( loading_state != LOADING_CALLIN )
>> +cpu_relax();
>> +
>> +cpumask_set_cpu(cpu, _callin_map);
>> +
>> +while ( loading_state != LOADING_ENTER )
>> +cpu_relax();
>
>If I'm reading it right, this will wait forever in case when...
>
>> +
>> +/*
>> + * If an error happened, control thread would set 'loading_state'
>> + * to LOADING_EXIT. Don't perform ucode loading for this case
>> + */
>> +if ( loading_state == LOADING_EXIT )
>> +return ret;

I tried to check whether there was an error here. But as you said, we
cannot reach here if 'control thread' set loading_state from LOADING_CALLIN
to LOADING_EXIT. I will do this check in the while-loop right above.

>> +
>> +ret = microcode_ops->apply_microcode(patch);
>> +if ( !ret )
>> +atomic_inc(_updated);
>> +atomic_inc(_out);
>> +
>> +while ( loading_state != LOADING_EXIT )
>> +cpu_relax();
>> +
>> +return ret;
>> +}
>> +
>> +static int control_thread_fn(const struct microcode_patch *patch)
>>  {
>> -unsigned int cpu;
>> +unsigned int cpu = smp_processor_id(), done;
>> +unsigned long tick;
>> +int ret;
>>  
>> -/* Store the patch after a successful loading */
>> -if ( !microcode_update_cpu(patch) && patch )
>> +/* Allow threads to call in */
>> +loading_state = LOADING_CALLIN;
>> +smp_mb();
>> +
>> +cpumask_set_cpu(cpu, _callin_map);
>> +
>> +/* Waiting for all threads calling in */
>> +ret = wait_for_condition(wait_cpu_callin,
>> + (void *)(unsigned long)num_online_cpus(),
>> + MICROCODE_CALLIN_TIMEOUT_US);
>> +if ( ret ) {
>> +loading_state = LOADING_EXIT;
>> +return ret;
>> +}
>
>...this condition holds. Have you actually tested this case?

I didn't craft a case to verify the error-handling path. And I believe
that you are right. 

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 14/15] microcode: remove microcode_update_lock

2019-08-18 Thread Chao Gao

microcode_update_lock is to prevent logic threads of a same core from
updating microcode at the same time. But due to using a global lock, it
also prevented parallel microcode updating on different cores.

Remove this lock in order to update microcode in parallel. It is safe
because we have already ensured serialization of sibling threads at the
caller side.
1.For late microcode update, do_microcode_update() ensures that only one
  sibiling thread of a core can update microcode.
2.For microcode update during system startup or CPU-hotplug,
  microcode_mutex() guarantees update serialization of logical threads.
3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and
  late microcode update.

Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD
only) are still processed sequentially.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v7:
 - reworked. Remove complex lock logics introduced in v5 and v6. The microcode
 patch to be applied is passed as an argument without any global variable. Thus
 no lock is added to serialize potential readers/writers. Callers of
 apply_microcode() will guarantee the correctness: the patch poninted by the
 arguments won't be changed by others.

Changes in v6:
 - introduce early_ucode_update_lock to serialize early ucode update.

Changes in v5:
 - newly add
---
 xen/arch/x86/microcode_amd.c   | 8 +---
 xen/arch/x86/microcode_intel.c | 8 +---
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index ec1c2eb..8685b3e 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -74,9 +74,6 @@ struct mpbhdr {
 uint8_t data[];
 };
 
-/* serialize access to the physical write */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 /* See comment in start_update() for cases when this routine fails */
 static int collect_cpu_info(struct cpu_signature *csig)
 {
@@ -220,7 +217,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint32_t rev;
 int hw_err;
 unsigned int cpu = smp_processor_id();
@@ -232,15 +228,13 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 hdr = patch->mc_amd->mpb;
 
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr);
 
 /* get patch id after patching */
 rdmsrl(MSR_AMD_PATCHLEVEL, rev);
 
-spin_unlock_irqrestore(_update_lock, flags);
-
 /*
  * Some processors leave the ucode blob mapping as UC after the update.
  * Flush the mapping to regain normal cacheability.
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index ae5759f..6186461 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -93,9 +93,6 @@ struct extended_sigtable {
 
 #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
 
-/* serialize access to the physical write to MSR 0x79 */
-static DEFINE_SPINLOCK(microcode_update_lock);
-
 static int collect_cpu_info(struct cpu_signature *csig)
 {
 unsigned int cpu_num = smp_processor_id();
@@ -284,7 +281,6 @@ static enum microcode_match_result compare_patch(
 
 static int apply_microcode(const struct microcode_patch *patch)
 {
-unsigned long flags;
 uint64_t msr_content;
 unsigned int val[2];
 unsigned int cpu_num = raw_smp_processor_id();
@@ -296,8 +292,7 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 
 mc_intel = patch->mc_intel;
 
-/* serialize access to the physical write to MSR 0x79 */
-spin_lock_irqsave(_update_lock, flags);
+BUG_ON(local_irq_is_enabled());
 
 /* write microcode via MSR 0x79 */
 wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc_intel->bits);
@@ -310,7 +305,6 @@ static int apply_microcode(const struct microcode_patch 
*patch)
 rdmsrl(MSR_IA32_UCODE_REV, msr_content);
 val[1] = (uint32_t)(msr_content >> 32);
 
-spin_unlock_irqrestore(_update_lock, flags);
 if ( val[1] != mc_intel->hdr.rev )
 {
 printk(KERN_ERR "microcode: CPU%d update from revision "
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

2019-08-18 Thread Chao Gao

register an nmi callback. And this callback does busy-loop on threads
which are waiting for loading completion. Control threads send NMI to
slave threads to prevent NMI acceptance during ucode loading.

Signed-off-by: Chao Gao 
---
Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 xen/arch/x86/microcode.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 91f9e81..d943835 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -38,6 +38,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -339,14 +340,8 @@ static int microcode_update_cpu(const struct 
microcode_patch *patch)
 
 static int slave_thread_fn(void)
 {
-unsigned int cpu = smp_processor_id();
 unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
 
-while ( loading_state != LOADING_CALLIN )
-cpu_relax();
-
-cpumask_set_cpu(cpu, _callin_map);
-
 while ( loading_state != LOADING_EXIT )
 cpu_relax();
 
@@ -399,6 +394,8 @@ static int control_thread_fn(const struct microcode_patch 
*patch)
 
 cpumask_set_cpu(cpu, _callin_map);
 
+smp_send_nmi_allbutself();
+
 /* Waiting for all threads calling in */
 ret = wait_for_condition(wait_cpu_callin,
  (void *)(unsigned long)num_online_cpus(),
@@ -481,12 +478,28 @@ static int do_microcode_update(void *patch)
 return ret;
 }
 
+static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+/* The first thread of a core is to load an update. Don't block it. */
+if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ||
+ loading_state != LOADING_CALLIN )
+return 0;
+
+cpumask_set_cpu(cpu, _callin_map);
+
+while ( loading_state != LOADING_EXIT )
+cpu_relax();
+
+return 0;
+}
+
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
 int ret;
 void *buffer;
 unsigned int cpu, updated;
 struct microcode_patch *patch;
+nmi_callback_t *saved_nmi_callback;
 
 if ( len != (uint32_t)len )
 return -E2BIG;
@@ -551,6 +564,8 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
  * watchdog timeout.
  */
 watchdog_disable();
+
+saved_nmi_callback = set_nmi_callback(microcode_nmi_callback);
 /*
  * Late loading dance. Why the heavy-handed stop_machine effort?
  *
@@ -563,6 +578,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
  *   conservative and good.
  */
 ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
+set_nmi_callback(saved_nmi_callback);
 watchdog_enable();
 
 updated = atomic_read(_updated);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 13/15] x86/microcode: Synchronize late microcode loading

2019-08-18 Thread Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao 
Tested-by: Chao Gao 
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian 
Cc: Jun Nakajima 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
 ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
 there are two places for a cpu to call in. bitmap won't be counted
 twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
 update. We have to do this because some threads may be stuck in NMI
 handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 289 +++
 1 file changed, 267 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index bdd9c9f..91f9e81 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 3
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 100
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+LOADING_PREPARE,
+LOADING_CALLIN,
+LOADING_ENTER,
+LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the highest revision number. Return error If no
  * patch is found and an error occurs during the parsing process. Otherwise
@@ -232,6 +276,34 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
+/* Wait for a condition to be met with a timeout (us). */
+static int wait_for_condition(int (*func)(void *data), void *data,
+ unsigned int timeout)
+{
+while ( !func(data) )
+{
+if ( !timeout-- )
+{
+printk("CPU%u

[Xen-devel] [PATCH v9 04/15] microcode: introduce a global cache of ucode patch

2019-08-18 Thread Chao Gao

to replace the current per-cpu cache 'uci->mc'.

With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. Having multiple microcode revisions
on different cpus would cause system unstable and should be avoided.
Hence, caching only one microcode update is good enough for all cases.

Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And
this global cache is loaded to cpu in apply_microcode().

All operations on the cache is protected by 'microcode_mutex'.

Note that I deliberately avoid touching the old per-cpu cache ('uci->mc')
as I am going to remove it completely in the following patches. We copy
everything to create the new cache blob to avoid reusing some buffers
previously allocated for the old per-cpu cache. It is not so efficient,
but it is already corrected by a patch later in this series.

Signed-off-by: Chao Gao 
---
Changes in v9:
 - on Intel side, ->compare_patch just checks the patch revision number.
 - explain why all buffers are copied in alloc_microcode_patch() in
 patch description.

Changes in v8:
 - Free generic wrapper struct in general code
 - Try to update cache as long as a patch covers current cpu. Previsouly,
 cache is updated only if the patch is newer than current update revision in
 the CPU. The small difference can work around a broken bios which only
 applies microcode update to BSP and software has to apply the same
 update to other CPUs.

Changes in v7:
 - reworked to cache only one microcode patch rather than a list of
 microcode patches.
---
 xen/arch/x86/microcode.c| 39 ++
 xen/arch/x86/microcode_amd.c| 90 +
 xen/arch/x86/microcode_intel.c  | 73 ++---
 xen/include/asm-x86/microcode.h | 17 
 4 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 421d57e..0ecd2fd 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -61,6 +61,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* Protected by microcode_mutex */
+static struct microcode_patch *microcode_cache;
+
 void __init microcode_set_module(unsigned int idx)
 {
 ucode_mod_idx = idx;
@@ -262,6 +265,42 @@ int microcode_resume_cpu(unsigned int cpu)
 return err;
 }
 
+void microcode_free_patch(struct microcode_patch *microcode_patch)
+{
+microcode_ops->free_patch(microcode_patch->mc);
+xfree(microcode_patch);
+}
+
+const struct microcode_patch *microcode_get_cache(void)
+{
+ASSERT(spin_is_locked(_mutex));
+
+return microcode_cache;
+}
+
+/* Return true if cache gets updated. Otherwise, return false */
+bool microcode_update_cache(struct microcode_patch *patch)
+{
+
+ASSERT(spin_is_locked(_mutex));
+
+if ( !microcode_cache )
+microcode_cache = patch;
+else if ( microcode_ops->compare_patch(patch,
+   microcode_cache) == NEW_UCODE )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = patch;
+}
+else
+{
+microcode_free_patch(patch);
+return false;
+}
+
+return true;
+}
+
 static int microcode_update_cpu(const void *buf, size_t size)
 {
 int err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 3db3555..30129ca 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -190,24 +190,83 @@ static enum microcode_match_result microcode_fits(
 return NEW_UCODE;
 }
 
+static bool match_cpu(const struct microcode_patch *patch)
+{
+if ( !patch )
+return false;
+return microcode_fits(patch->mc_amd, smp_processor_id()) == NEW_UCODE;
+}
+
+static struct microcode_patch *alloc_microcode_patch(
+const struct microcode_amd *mc_amd)
+{
+struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
+struct microcode_amd *cache = xmalloc(struct microcode_amd);
+void *mpb = xmalloc_bytes(mc_amd->mpb_size);
+struct equiv_cpu_entry *equiv_cpu_table =
+xmalloc_bytes(mc_amd->equiv_cpu_table_size);
+
+if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
+{
+xfree(microcode_patch);
+xfree(cache);
+xfree(mpb);
+xfree(equiv_cpu_table);
+return ERR_PTR(-ENOMEM);
+}
+
+memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
+cache->mpb = mpb;
+cache->mpb_size = mc_amd->mpb_size;
+memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
+   mc_amd->equiv_cpu_table_size);

[Xen-devel] [PATCH v9 01/15] microcode/intel: extend microcode_update_match()

2019-08-18 Thread Chao Gao

to a more generic function. So that it can be used alone to check
an update against the CPU signature and current update revision.

Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header.

Signed-off-by: Chao Gao 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - microcode_update_match() doesn't accept (sig, pf, rev) any longer.
 Hence, it won't be used to compare two arbitrary updates.
 - rewrite patch description

Changes in v8:
 - make sure enough room for an extended header and signature array

Changes in v6:
 - eliminate unnecessary type casting in microcode_update_match
 - check if a patch has an extend header

Changes in v5:
 - constify the extended_signature
 - use named enum type for the return value of microcode_update_match
---
 xen/arch/x86/microcode_intel.c  | 60 ++---
 xen/include/asm-x86/microcode.h |  6 +
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 22fdeca..c185b5c 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -134,14 +134,39 @@ static int collect_cpu_info(unsigned int cpu_num, struct 
cpu_signature *csig)
 return 0;
 }
 
-static inline int microcode_update_match(
-unsigned int cpu_num, const struct microcode_header_intel *mc_header,
-int sig, int pf)
+/* Check an update against the CPU signature and current update revision */
+static enum microcode_match_result microcode_update_match(
+const struct microcode_header_intel *mc_header, unsigned int cpu)
 {
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu_num);
-
-return (sigmatch(sig, uci->cpu_sig.sig, pf, uci->cpu_sig.pf) &&
-(mc_header->rev > uci->cpu_sig.rev));
+const struct extended_sigtable *ext_header;
+const struct extended_signature *ext_sig;
+unsigned int i;
+struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+unsigned int sig = uci->cpu_sig.sig;
+unsigned int pf = uci->cpu_sig.pf;
+unsigned int rev = uci->cpu_sig.rev;
+unsigned long data_size = get_datasize(mc_header);
+const void *end = (const void *)mc_header + get_totalsize(mc_header);
+
+if ( sigmatch(sig, mc_header->sig, pf, mc_header->pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+ext_header = (const void *)(mc_header + 1) + data_size;
+ext_sig = (const void *)(ext_header + 1);
+
+/*
+ * Make sure there is enough space to hold an extended header and enough
+ * array elements.
+ */
+if ( (end < (const void *)ext_sig) ||
+ (end < (const void *)(ext_sig + ext_header->count)) )
+return MIS_UCODE;
+
+for ( i = 0; i < ext_header->count; i++ )
+if ( sigmatch(sig, ext_sig[i].sig, pf, ext_sig[i].pf) )
+return (mc_header->rev > rev) ? NEW_UCODE : OLD_UCODE;
+
+return MIS_UCODE;
 }
 
 static int microcode_sanity_check(void *mc)
@@ -243,31 +268,12 @@ static int get_matching_microcode(const void *mc, 
unsigned int cpu)
 {
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
 const struct microcode_header_intel *mc_header = mc;
-const struct extended_sigtable *ext_header;
 unsigned long total_size = get_totalsize(mc_header);
-int ext_sigcount, i;
-struct extended_signature *ext_sig;
 void *new_mc;
 
-if ( microcode_update_match(cpu, mc_header,
-mc_header->sig, mc_header->pf) )
-goto find;
-
-if ( total_size <= (get_datasize(mc_header) + MC_HEADER_SIZE) )
+if ( microcode_update_match(mc, cpu) != NEW_UCODE )
 return 0;
 
-ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-ext_sigcount = ext_header->count;
-ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-for ( i = 0; i < ext_sigcount; i++ )
-{
-if ( microcode_update_match(cpu, mc_header,
-ext_sig->sig, ext_sig->pf) )
-goto find;
-ext_sig++;
-}
-return 0;
- find:
 pr_debug("microcode: CPU%d found a matching microcode update with"
  " version %#x (current=%#x)\n",
  cpu, mc_header->rev, uci->cpu_sig.rev);
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 23ea954..882f560 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -3,6 +3,12 @@
 
 #include 
 
+enum microcode_match_result {
+OLD_UCODE, /* signature matched, but revision id is older or equal */
+NEW_UCODE, /* signature matched, but revision id is newer */
+MIS_UCODE, /* signature mismatched */
+};
+
 struct cpu_signature;
 struct ucode_cpu_info;
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 02/15] microcode/amd: fix memory leak

2019-08-18 Thread Chao Gao

Two buffers, '->equiv_cpu_table' and '->mpb',  inside 'mc_amd' might be
allocated and in the error-handing path they are not freed properly.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - use xzalloc() to get rid of explicitly initializing some fields
 to NULL/0.

changes in v8:
 - new
 - it is found by reading code. No test is done.
---
 xen/arch/x86/microcode_amd.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 7a854c0..3069784 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -425,7 +425,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 goto out;
 }
 
-mc_amd = xmalloc(struct microcode_amd);
+mc_amd = xzalloc(struct microcode_amd);
 if ( !mc_amd )
 {
 printk(KERN_ERR "microcode: Cannot allocate memory for microcode 
patch\n");
@@ -479,6 +479,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 
 if ( error )
 {
+xfree(mc_amd->equiv_cpu_table);
 xfree(mc_amd);
 goto out;
 }
@@ -491,8 +492,6 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
  * It's possible the data file has multiple matching ucode,
  * lets keep searching till the latest version
  */
-mc_amd->mpb = NULL;
-mc_amd->mpb_size = 0;
 last_offset = offset;
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
)) == 0 )
@@ -549,11 +548,13 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 
 if ( save_error )
 {
-xfree(mc_amd);
 uci->mc.mc_amd = mc_old;
+mc_old = mc_amd;
 }
-else
-xfree(mc_old);
+
+xfree(mc_old->mpb);
+xfree(mc_old->equiv_cpu_table);
+xfree(mc_old);
 
   out:
 #if CONFIG_HVM
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 06/15] microcode: remove struct ucode_cpu_info

2019-08-18 Thread Chao Gao

Remove the per-cpu cache field in struct ucode_cpu_info since it has
been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.

The cpu status notifier is also removed. It was used to free the "mc"
field to avoid memory leak.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - rebase and fix conflict

Changes in v8:
 - split microcode_resume_cpu() cleanup to a separate patch.

Changes in v6:
 - remove the whole struct ucode_cpu_info instead of the per-cpu cache
  in it.
---
 xen/arch/x86/apic.c |  2 +-
 xen/arch/x86/microcode.c| 57 +++-
 xen/arch/x86/microcode_amd.c| 58 +++--
 xen/arch/x86/microcode_intel.c  | 28 +++-
 xen/arch/x86/spec_ctrl.c|  2 +-
 xen/include/asm-x86/microcode.h | 12 +
 6 files changed, 36 insertions(+), 123 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 9c3c998..ae1f1e9 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1193,7 +1193,7 @@ static void __init check_deadline_errata(void)
 else
 rev = (unsigned long)m->driver_data;
 
-if ( this_cpu(ucode_cpu_info).cpu_sig.rev >= rev )
+if ( this_cpu(cpu_sig).rev >= rev )
 return;
 
 setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE);
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index ca5ee37..552e7fe 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -187,7 +187,7 @@ const struct microcode_ops *microcode_ops;
 
 static DEFINE_SPINLOCK(microcode_mutex);
 
-DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
+DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 struct microcode_info {
 unsigned int cpu;
@@ -196,32 +196,17 @@ struct microcode_info {
 char buffer[1];
 };
 
-static void __microcode_fini_cpu(unsigned int cpu)
-{
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-
-xfree(uci->mc.mc_valid);
-memset(uci, 0, sizeof(*uci));
-}
-
-static void microcode_fini_cpu(unsigned int cpu)
-{
-spin_lock(_mutex);
-__microcode_fini_cpu(cpu);
-spin_unlock(_mutex);
-}
-
 int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->apply_microcode(cpu);
 spin_unlock(_mutex);
@@ -269,16 +254,13 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 {
 int err;
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+err = microcode_ops->collect_cpu_info(cpu, sig);
 if ( likely(!err) )
 err = microcode_ops->cpu_request_microcode(cpu, buf, size);
-else
-__microcode_fini_cpu(cpu);
-
 spin_unlock(_mutex);
 
 return err;
@@ -365,29 +347,10 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-static int microcode_percpu_callback(
-struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-unsigned int cpu = (unsigned long)hcpu;
-
-switch ( action )
-{
-case CPU_DEAD:
-microcode_fini_cpu(cpu);
-break;
-}
-
-return NOTIFY_DONE;
-}
-
-static struct notifier_block microcode_percpu_nfb = {
-.notifier_call = microcode_percpu_callback,
-};
-
 int __init early_microcode_update_cpu(bool start_update)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -406,7 +369,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+microcode_ops->collect_cpu_info(cpu, sig);
 
 if ( data )
 {
@@ -425,7 +388,7 @@ int __init early_microcode_update_cpu(bool start_update)
 int __init early_microcode_init(void)
 {
 unsigned int cpu = smp_processor_id();
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -438,12 +401,10 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-microcode_ops->collect_cpu_info(cpu, >cpu_sig);
+microcode_ops->collect_cpu_info(cpu, sig);

[Xen-devel] [PATCH v9 03/15] microcode/amd: distinguish old and mismatched ucode in microcode_fits()

2019-08-18 Thread Chao Gao

Sometimes, an ucode with a level lower than or equal to current CPU's
patch level is useful. For example, to work around a broken bios which
only loads ucode for BSP, when BSP parses an ucode blob during bootup,
it is better to save an ucode with lower or equal level for APs

No functional change is made in this patch. But following patch would
handle "old ucode" and "mismatched ucode" separately.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v8:
 - new
---
 xen/arch/x86/microcode_amd.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 3069784..3db3555 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -152,8 +152,8 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 return 0;
 }
 
-static bool_t microcode_fits(const struct microcode_amd *mc_amd,
- unsigned int cpu)
+static enum microcode_match_result microcode_fits(
+const struct microcode_amd *mc_amd, unsigned int cpu)
 {
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
@@ -167,27 +167,27 @@ static bool_t microcode_fits(const struct microcode_amd 
*mc_amd,
 current_cpu_id = cpuid_eax(0x0001);
 
 if ( !find_equiv_cpu_id(equiv_cpu_table, current_cpu_id, _cpu_id) )
-return 0;
+return MIS_UCODE;
 
 if ( (mc_header->processor_rev_id) != equiv_cpu_id )
-return 0;
+return MIS_UCODE;
 
 if ( !verify_patch_size(mc_amd->mpb_size) )
 {
 pr_debug("microcode: patch size mismatch\n");
-return 0;
+return MIS_UCODE;
 }
 
 if ( mc_header->patch_id <= uci->cpu_sig.rev )
 {
 pr_debug("microcode: patch is already at required level or 
greater.\n");
-return 0;
+return OLD_UCODE;
 }
 
 pr_debug("microcode: CPU%d found a matching microcode update with version 
%#x (current=%#x)\n",
  cpu, mc_header->patch_id, uci->cpu_sig.rev);
 
-return 1;
+return NEW_UCODE;
 }
 
 static int apply_microcode(unsigned int cpu)
@@ -496,7 +496,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
)) == 0 )
 {
-if ( microcode_fits(mc_amd, cpu) )
+if ( microcode_fits(mc_amd, cpu) == NEW_UCODE )
 {
 error = apply_microcode(cpu);
 if ( error )
@@ -576,7 +576,7 @@ static int microcode_resume_match(unsigned int cpu, const 
void *mc)
 struct microcode_amd *mc_amd = uci->mc.mc_amd;
 const struct microcode_amd *src = mc;
 
-if ( !microcode_fits(src, cpu) )
+if ( microcode_fits(src, cpu) != NEW_UCODE )
 return 0;
 
 if ( src != mc_amd )
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 05/15] microcode: clean up microcode_resume_cpu

2019-08-18 Thread Chao Gao

Previously, a per-cpu ucode cache is maintained. Then each CPU had one
per-cpu update cache and there might be multiple versions of microcode.
Thus microcode_resume_cpu tried best to update microcode by loading
every update cache until a successful load.

But now the cache struct is simplified a lot and only a single ucode is
cached. a single invocation of ->apply_microcode() would load the cache
and make microcode updated.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
changes in v8:
 - new
 - separated from the following patch
---
 xen/arch/x86/microcode.c| 40 ++-
 xen/arch/x86/microcode_amd.c| 47 -
 xen/arch/x86/microcode_intel.c  |  6 --
 xen/include/asm-x86/microcode.h |  1 -
 4 files changed, 2 insertions(+), 92 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 0ecd2fd..ca5ee37 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -215,8 +215,6 @@ int microcode_resume_cpu(unsigned int cpu)
 {
 int err;
 struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-struct cpu_signature nsig;
-unsigned int cpu2;
 
 if ( !microcode_ops )
 return 0;
@@ -224,42 +222,8 @@ int microcode_resume_cpu(unsigned int cpu)
 spin_lock(_mutex);
 
 err = microcode_ops->collect_cpu_info(cpu, >cpu_sig);
-if ( err )
-{
-__microcode_fini_cpu(cpu);
-spin_unlock(_mutex);
-return err;
-}
-
-if ( uci->mc.mc_valid )
-{
-err = microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid);
-if ( err >= 0 )
-{
-if ( err )
-err = microcode_ops->apply_microcode(cpu);
-spin_unlock(_mutex);
-return err;
-}
-}
-
-nsig = uci->cpu_sig;
-__microcode_fini_cpu(cpu);
-uci->cpu_sig = nsig;
-
-err = -EIO;
-for_each_online_cpu ( cpu2 )
-{
-uci = _cpu(ucode_cpu_info, cpu2);
-if ( uci->mc.mc_valid &&
- microcode_ops->microcode_resume_match(cpu, uci->mc.mc_valid) > 0 )
-{
-err = microcode_ops->apply_microcode(cpu);
-break;
-}
-}
-
-__microcode_fini_cpu(cpu);
+if ( likely(!err) )
+err = microcode_ops->apply_microcode(cpu);
 spin_unlock(_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 30129ca..b351894 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -643,52 +643,6 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-struct ucode_cpu_info *uci = _cpu(ucode_cpu_info, cpu);
-struct microcode_amd *mc_amd = uci->mc.mc_amd;
-const struct microcode_amd *src = mc;
-
-if ( microcode_fits(src, cpu) != NEW_UCODE )
-return 0;
-
-if ( src != mc_amd )
-{
-if ( mc_amd )
-{
-xfree(mc_amd->equiv_cpu_table);
-xfree(mc_amd->mpb);
-xfree(mc_amd);
-}
-
-mc_amd = xmalloc(struct microcode_amd);
-uci->mc.mc_amd = mc_amd;
-if ( !mc_amd )
-return -ENOMEM;
-mc_amd->equiv_cpu_table = xmalloc_bytes(src->equiv_cpu_table_size);
-if ( !mc_amd->equiv_cpu_table )
-goto err1;
-mc_amd->mpb = xmalloc_bytes(src->mpb_size);
-if ( !mc_amd->mpb )
-goto err2;
-
-mc_amd->equiv_cpu_table_size = src->equiv_cpu_table_size;
-mc_amd->mpb_size = src->mpb_size;
-memcpy(mc_amd->mpb, src->mpb, src->mpb_size);
-memcpy(mc_amd->equiv_cpu_table, src->equiv_cpu_table,
-   src->equiv_cpu_table_size);
-}
-
-return 1;
-
-err2:
-xfree(mc_amd->equiv_cpu_table);
-err1:
-xfree(mc_amd);
-uci->mc.mc_amd = NULL;
-return -ENOMEM;
-}
-
 static int start_update(void)
 {
 #if CONFIG_HVM
@@ -708,7 +662,6 @@ static int start_update(void)
 }
 
 static const struct microcode_ops microcode_amd_ops = {
-.microcode_resume_match   = microcode_resume_match,
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index 14485dc..58eb186 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -446,13 +446,7 @@ static int cpu_request_microcode(unsigned int cpu, const 
void *buf,
 return error;
 }
 
-static int microcode_resume_match(unsigned int cpu, const void *mc)
-{
-return get_matching_microcode(mc, cpu);
-}
-
 static const struct microcode_ops microcode_intel_ops = {
-

[Xen-devel] [PATCH v9 11/15] microcode: unify loading update during CPU resuming and AP wakeup

2019-08-18 Thread Chao Gao

Both are loading the cached patch. Since APs call the unified function,
microcode_update_one(), during wakeup, the 'start_update' parameter
which originally used to distinguish BSP and APs is redundant. So remove
this parameter.

Signed-off-by: Chao Gao 
---
Note that here is a functional change: resuming a CPU would call
->end_update() now while previously it wasn't. Not quite sure
whether it is correct.

Changes in v9:
 - return -EOPNOTSUPP rather than 0 if microcode_ops is NULL in
   microcode_update_one()
 - rebase and fix conflicts.

Changes in v8:
 - split out from the previous patch
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 90 ++---
 xen/arch/x86/smpboot.c  |  5 +--
 xen/include/asm-x86/processor.h |  4 +-
 4 files changed, 44 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 4f21903..24798d5 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -253,7 +253,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu();
+microcode_update_one();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index a2febc7..bdd9c9f 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -203,24 +203,6 @@ static struct microcode_patch *parse_blob(const char *buf, 
uint32_t len)
 return NULL;
 }
 
-int microcode_resume_cpu(void)
-{
-int err;
-struct cpu_signature *sig = _cpu(cpu_sig);
-
-if ( !microcode_ops )
-return 0;
-
-spin_lock(_mutex);
-
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->apply_microcode(microcode_cache);
-spin_unlock(_mutex);
-
-return err;
-}
-
 void microcode_free_patch(struct microcode_patch *microcode_patch)
 {
 microcode_ops->free_patch(microcode_patch->mc);
@@ -384,11 +366,29 @@ static int __init microcode_init(void)
 }
 __initcall(microcode_init);
 
-int __init early_microcode_update_cpu(bool start_update)
+/* Load a cached update to current cpu */
+int microcode_update_one(void)
+{
+int rc;
+
+if ( !microcode_ops )
+return -EOPNOTSUPP;
+
+rc = microcode_update_cpu(NULL);
+
+if ( microcode_ops->end_update )
+microcode_ops->end_update();
+
+return rc;
+}
+
+/* BSP calls this function to parse ucode blob and then apply an update. */
+int __init early_microcode_update_cpu(void)
 {
 int rc = 0;
 void *data = NULL;
 size_t len;
+struct microcode_patch *patch;
 
 if ( !microcode_ops )
 return -ENOSYS;
@@ -409,43 +409,33 @@ int __init early_microcode_update_cpu(bool start_update)
 if ( !data )
 return -ENOMEM;
 
-if ( start_update )
-{
-struct microcode_patch *patch;
-
-if ( microcode_ops->start_update )
-rc = microcode_ops->start_update();
-
-if ( rc )
-return rc;
-
-patch = parse_blob(data, len);
-if ( IS_ERR(patch) )
-{
-printk(XENLOG_INFO "Parsing microcode blob error %ld\n",
-   PTR_ERR(patch));
-return PTR_ERR(patch);
-}
+if ( microcode_ops->start_update )
+rc = microcode_ops->start_update();
 
-if ( !patch )
-{
-printk(XENLOG_INFO "No ucode found. Update aborted!\n");
-return -EINVAL;
-}
+if ( rc )
+return rc;
 
-spin_lock(_mutex);
-rc = microcode_update_cache(patch);
-spin_unlock(_mutex);
+patch = parse_blob(data, len);
+if ( IS_ERR(patch) )
+{
+printk(XENLOG_INFO "Parsing microcode blob error %ld\n",
+   PTR_ERR(patch));
+return PTR_ERR(patch);
+}
 
-ASSERT(rc);
+if ( !patch )
+{
+printk(XENLOG_INFO "No ucode found. Update aborted!\n");
+return -EINVAL;
 }
 
-rc = microcode_update_cpu(NULL);
+spin_lock(_mutex);
+rc = microcode_update_cache(patch);
+spin_unlock(_mutex);
 
-if ( microcode_ops->end_update )
-microcode_ops->end_update();
+ASSERT(rc);
 
-return rc;
+return microcode_update_one();
 }
 
 int __init early_microcode_init(void)
@@ -465,7 +455,7 @@ int __init early_microcode_init(void)
 microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
-rc = early_microcode_update_cpu(true);
+rc = early_microcode_update_cpu();
 }
 
 return rc;
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c818cfc..e62a1ca 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -361,10 +361,7 @@ void start_secondary(void *unused)
 
 initialize_cpu_data(cpu);
 
-if

[Xen-devel] [PATCH v9 10/15] microcode: split out apply_microcode() from cpu_request_microcode()

2019-08-18 Thread Chao Gao

During late microcode loading, apply_microcode() is invoked in
cpu_request_microcode(). To make late microcode update more reliable,
we want to put the apply_microcode() into stop_machine context. So
we split out it from cpu_request_microcode(). In general, for both
early loading on BSP and late loading, cpu_request_microcode() is
called first to get the matching microcode update contained by
the blob and then apply_microcode() is invoked explicitly on each
cpu in common code.

Given that all CPUs are supposed to have the same signature, parsing
microcode only needs to be done once. So cpu_request_microcode() is
also moved out of microcode_update_cpu().

In some cases (e.g. a broken bios), the system may have multiple
revisions of microcode update. So we would try to load a microcode
update as long as it covers current cpu. And if a cpu loads this patch
successfully, the patch would be stored into the patch cache.

Signed-off-by: Chao Gao 
---
Changes in v9:
 - remove the calling of ->compare_patch in microcode_update_cpu().
 - drop "microcode_" prefix for static function - microcode_parse_blob().
 - rebase and fix conflict

Changes in v8:
 - divide the original patch into three patches to improve readability
 - load an update on each cpu as long as the update covers current cpu
 - store an update after the first successful loading on a CPU
 - Make sure the current CPU (especially pf value) is covered
 by updates.

changes in v7:
 - to handle load failure, unvalidated patches won't be cached. They
 are passed as function arguments. So if update failed, we needn't
 any cleanup to microcode cache.
---
 xen/arch/x86/microcode.c| 177 ++--
 xen/arch/x86/microcode_amd.c|  38 +
 xen/arch/x86/microcode_intel.c  |  66 +++
 xen/include/asm-x86/microcode.h |   5 +-
 4 files changed, 172 insertions(+), 114 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 0e9322a..a2febc7 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -189,12 +189,19 @@ static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
-struct microcode_info {
-unsigned int cpu;
-uint32_t buffer_size;
-int error;
-char buffer[1];
-};
+/*
+ * Return a patch that covers current CPU. If there are multiple patches,
+ * return the one with the highest revision number. Return error If no
+ * patch is found and an error occurs during the parsing process. Otherwise
+ * return NULL.
+ */
+static struct microcode_patch *parse_blob(const char *buf, uint32_t len)
+{
+if ( likely(!microcode_ops->collect_cpu_info(_cpu(cpu_sig))) )
+return microcode_ops->cpu_request_microcode(buf, len);
+
+return NULL;
+}
 
 int microcode_resume_cpu(void)
 {
@@ -220,13 +227,6 @@ void microcode_free_patch(struct microcode_patch 
*microcode_patch)
 xfree(microcode_patch);
 }
 
-const struct microcode_patch *microcode_get_cache(void)
-{
-ASSERT(spin_is_locked(_mutex));
-
-return microcode_cache;
-}
-
 /* Return true if cache gets updated. Otherwise, return false */
 bool microcode_update_cache(struct microcode_patch *patch)
 {
@@ -250,49 +250,71 @@ bool microcode_update_cache(struct microcode_patch *patch)
 return true;
 }
 
-static int microcode_update_cpu(const void *buf, size_t size)
+/*
+ * Load a microcode update to current CPU.
+ *
+ * If no patch is provided, the cached patch will be loaded. Microcode update
+ * during APs bringup and CPU resuming falls into this case.
+ */
+static int microcode_update_cpu(const struct microcode_patch *patch)
 {
-int err;
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
+int err = microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
-spin_lock(_mutex);
+if ( unlikely(err) )
+return err;
 
-err = microcode_ops->collect_cpu_info(sig);
-if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(buf, size);
-spin_unlock(_mutex);
+if ( patch )
+err = microcode_ops->apply_microcode(patch);
+else if ( microcode_cache )
+{
+spin_lock(_mutex);
+err = microcode_ops->apply_microcode(microcode_cache);
+if ( err == -EIO )
+{
+microcode_free_patch(microcode_cache);
+microcode_cache = NULL;
+}
+spin_unlock(_mutex);
+}
+else
+/* No patch to update */
+err = -ENOENT;
 
 return err;
 }
 
-static long do_microcode_update(void *_info)
+static long do_microcode_update(void *patch)
 {
-struct microcode_info *info = _info;
-int error;
-
-BUG_ON(info->cpu != smp_processor_id());
+unsigned int cpu;
 
-error = microcode_update_cpu(info->buffer, info->buffer_size);
-if ( error )
-info->error = error;
+/* Store the patch after a successful loading */
+if ( !microcode_update_c

[Xen-devel] [PATCH v9 08/15] microcode/amd: call svm_host_osvw_init() in common code

2019-08-18 Thread Chao Gao

Introduce a vendor hook, .end_update, for svm_host_osvw_init().
The hook function is called on each cpu after loading an update.
It is a preparation for spliting out apply_microcode() from
cpu_request_microcode().

Note that svm_host_osvm_init() should be called regardless of the
result of loading an update.

Signed-off-by: Chao Gao 
---
Changes in v9:
 - call .end_update in early loading path
 - on AMD side, initialize .{start,end}_update only if "CONFIG_HVM"
 is true.
---
 xen/arch/x86/microcode.c| 10 +-
 xen/arch/x86/microcode_amd.c| 23 ++-
 xen/include/asm-x86/microcode.h |  1 +
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 3b87c72..c9401a7 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -277,6 +277,9 @@ static long do_microcode_update(void *_info)
 if ( error )
 info->error = error;
 
+if ( microcode_ops->end_update )
+microcode_ops->end_update();
+
 info->cpu = cpumask_next(info->cpu, _online_map);
 if ( info->cpu < nr_cpu_ids )
 return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
@@ -377,7 +380,12 @@ int __init early_microcode_update_cpu(bool start_update)
 if ( rc )
 return rc;
 
-return microcode_update_cpu(data, len);
+rc = microcode_update_cpu(data, len);
+
+if ( microcode_ops->end_update )
+microcode_ops->end_update();
+
+return rc;
 }
 else
 return -ENOMEM;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index dd3821c..b85fb04 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -594,10 +594,6 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 xfree(mc_amd);
 
   out:
-#if CONFIG_HVM
-svm_host_osvw_init();
-#endif
-
 /*
  * In some cases we may return an error even if processor's microcode has
  * been updated. For example, the first patch in a container file is loaded
@@ -609,27 +605,28 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 
 static int start_update(void)
 {
-#if CONFIG_HVM
 /*
- * We assume here that svm_host_osvw_init() will be called on each cpu 
(from
- * cpu_request_microcode()).
- *
- * Note that if collect_cpu_info() returns an error then
- * cpu_request_microcode() will not invoked thus leaving OSVW bits not
- * updated. Currently though collect_cpu_info() will not fail on processors
- * supporting OSVW so we will not deal with this possibility.
+ * svm_host_osvw_init() will be called on each cpu by calling '.end_update'
+ * in common code.
  */
 svm_host_osvw_reset();
-#endif
 
 return 0;
 }
 
+static void end_update(void)
+{
+svm_host_osvw_init();
+}
+
 static const struct microcode_ops microcode_amd_ops = {
 .cpu_request_microcode= cpu_request_microcode,
 .collect_cpu_info = collect_cpu_info,
 .apply_microcode  = apply_microcode,
+#if CONFIG_HVM
 .start_update = start_update,
+.end_update   = end_update,
+#endif
 .free_patch   = free_patch,
 .compare_patch= compare_patch,
 .match_cpu= match_cpu,
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 35223eb..c8d2c4f 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -24,6 +24,7 @@ struct microcode_ops {
 int (*collect_cpu_info)(struct cpu_signature *csig);
 int (*apply_microcode)(void);
 int (*start_update)(void);
+void (*end_update)(void);
 void (*free_patch)(void *mc);
 bool (*match_cpu)(const struct microcode_patch *patch);
 enum microcode_match_result (*compare_patch)(
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 00/15] improve late microcode loading

2019-08-18 Thread Chao Gao

Changes in version 9:
 - add Jan's Reviewed-by
 - rendevzous threads in NMI handler to disable NMI. Note that NMI can
 be served as usual on threads that are chosen to initiate ucode loading
 on each core.
 - avoid unnecessary memory allocation or copy when creating a microcode
 patch (patch 12)
 - rework patch 1 to avoid microcode_update_match() being used to
 compare two arbitrary updates.
 - call .end_update in early loading path.

Sergey, could you help to test this series on an AMD machine?
Regarding changes to AMD side, I didn't do any test for them due to
lack of hardware. At least, two basic tests are needed:
* do a microcode update after system bootup
* don't bring all pCPUs up at bootup by specifying maxcpus option in xen
  command line and then do a microcode update and online all offlined
  CPUs via 'xen-hptool'.

The intention of this series is to make the late microcode loading
more reliable by rendezvousing all cpus in stop_machine context.
This idea comes from Ashok. I am porting his linux patch to Xen
(see patch 13 more details).

This series includes below changes:
 1. Patch 1-12: introduce a global microcode cache and some cleanup
 2. Patch 13: synchronize late microcode loading
 3. Patch 14: support parallel microcodes update on different cores
 4. Patch 15: block #NMI handling during microcode loading

Currently, late microcode loading does a lot of things including
parsing microcode blob, checking the signature/revision and performing
update. Putting all of them into stop_machine context is a bad idea
because of complexity (one issue I observed is memory allocation
triggered one assertion in stop_machine context). To simplify the
load process, parsing microcode is moved out of the load process.
Remaining parts of load process is put to stop_machine context.

Previous change log:
Changes in version 8:
 - block #NMI handling during microcode loading (Patch 16)
 - Don't assume that all CPUs in the system have loaded a same ucode.
 So when parsing a blob, we attempt to save a patch as long as it matches
 with current cpu signature regardless of the revision of the patch.
 And also for loading, we only require the patch to be loaded isn't old
 than the cached one.
 - store an update after the first successful loading on a CPU
 - remove the patch that calls wbinvd() unconditionally before microcode
 loading. It is under internal discussion.
 - divide two big patches into several patches to improve readability.

Changes in version 7:
 - cache one microcode update rather than a list of it. Assuming that all CPUs
 (including those will be plugged in later) in the system have the same
 signature, one update matches with one CPU should match with others. Thus, one
 update is enough for microcode updating during CPU hot-plug and resuming.
 - To handle load failure, microcode update is cached after it is applied to
 avoid a broken update overriding a validated one. Unvalidated microcode updates
 are passed by arguments rather than another global variable, where this series
 slightly differs from Roger's suggestion in:
 https://lists.xen.org/archives/html/xen-devel/2019-03/msg00776.html
 - incorporate Sergey's patch (patch 10) to fix a bug: we maintain a variable
 to reflect current microcode revision. But in some cases, this variable isn't
 initialized during system boot time, which results in falsely reporting that
 processor is susceptible to some known vulnerabilities.
 - fix issues reported by Sergey:
 https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html
 - Responses to Sergey/Roger/Wei/Ashok's other comments.

Major changes in version 6:
 - run wbinvd before updating microcode (patch 10)
 - add an userspace tool for late microcode update (patch 1)
 - scale time to wait by the number of remaining CPUs to respond 
 - remove 'cpu' parameters from some related callbacks and functins
 - save an ucode patch only if its supported CPU is allowed to mix with
   current cpu.

Changes in version 5:
 - support parallel microcode updates for all cores (see patch 8)
 - Address Roger's comments on the last version.


Chao Gao (15):
  microcode/intel: extend microcode_update_match()
  microcode/amd: fix memory leak
  microcode/amd: distinguish old and mismatched ucode in
microcode_fits()
  microcode: introduce a global cache of ucode patch
  microcode: clean up microcode_resume_cpu
  microcode: remove struct ucode_cpu_info
  microcode: remove pointless 'cpu' parameter
  microcode/amd: call svm_host_osvw_init() in common code
  microcode: pass a patch pointer to apply_microcode()
  microcode: split out apply_microcode() from cpu_request_microcode()
  microcode: unify loading update during CPU resuming and AP wakeup
  microcode: reduce memory allocation and copy when creating a patch
  x86/microcode: Synchronize late microcode loading
  microcode: remove microcode_update_lock
  microcode: block #NMI handling when loading an ucode

 xen/arch/x86/acpi/power.c   |   2 +-
 xen/arch

[Xen-devel] [PATCH v9 09/15] microcode: pass a patch pointer to apply_microcode()

2019-08-18 Thread Chao Gao

apply_microcode()'s always loading the cached ucode patch forces
a patch to be stored before being loading. Make apply_microcode()
accept a patch pointer to remove the limitation so that a patch
can be stored after a successful loading.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/microcode.c| 2 +-
 xen/arch/x86/microcode_amd.c| 5 ++---
 xen/arch/x86/microcode_intel.c  | 5 ++---
 xen/include/asm-x86/microcode.h | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index c9401a7..0e9322a 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -208,7 +208,7 @@ int microcode_resume_cpu(void)
 
 err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode();
+err = microcode_ops->apply_microcode(microcode_cache);
 spin_unlock(_mutex);
 
 return err;
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index b85fb04..21cdfe0 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -248,7 +248,7 @@ static enum microcode_match_result compare_patch(
 return MIS_UCODE;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint32_t rev;
@@ -256,7 +256,6 @@ static int apply_microcode(void)
 unsigned int cpu = smp_processor_id();
 struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 const struct microcode_header_amd *hdr;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !match_cpu(patch) )
 return -EINVAL;
@@ -557,7 +556,7 @@ static int cpu_request_microcode(const void *buf, size_t 
bufsize)
 
 if ( match_cpu(microcode_get_cache()) )
 {
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 if ( error )
 break;
 }
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index a5452d4..8c0008c 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -319,7 +319,7 @@ static int get_matching_microcode(const void *mc)
 return 1;
 }
 
-static int apply_microcode(void)
+static int apply_microcode(const struct microcode_patch *patch)
 {
 unsigned long flags;
 uint64_t msr_content;
@@ -327,7 +327,6 @@ static int apply_microcode(void)
 unsigned int cpu_num = raw_smp_processor_id();
 struct cpu_signature *sig = _cpu(cpu_sig);
 const struct microcode_intel *mc_intel;
-const struct microcode_patch *patch = microcode_get_cache();
 
 if ( !match_cpu(patch) )
 return -EINVAL;
@@ -422,7 +421,7 @@ static int cpu_request_microcode(const void *buf, size_t 
size)
 error = offset;
 
 if ( !error && match_cpu(microcode_get_cache()) )
-error = apply_microcode();
+error = apply_microcode(microcode_get_cache());
 
 return error;
 }
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index c8d2c4f..8c7de9d 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -22,7 +22,7 @@ struct microcode_patch {
 struct microcode_ops {
 int (*cpu_request_microcode)(const void *buf, size_t size);
 int (*collect_cpu_info)(struct cpu_signature *csig);
-int (*apply_microcode)(void);
+int (*apply_microcode)(const struct microcode_patch *patch);
 int (*start_update)(void);
 void (*end_update)(void);
 void (*free_patch)(void *mc);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 07/15] microcode: remove pointless 'cpu' parameter

2019-08-18 Thread Chao Gao

Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.

Signed-off-by: Chao Gao 
Reviewed-by: Jan Beulich 
---
Changes in v9:
 - use a convenience variable 'cpu' in collect_cpu_info() on AMD side
 - rebase and fix conflicts

Changes in v8:
 - Use current_cpu_data in collect_cpu_info()
 - keep the cpu parameter of check_final_patch_levels()
 - use smp_processor_id() in get_matching_microcode() rather than
 define a local variable and label it "__maybe_unused"
---
 xen/arch/x86/acpi/power.c   |  2 +-
 xen/arch/x86/microcode.c| 20 
 xen/arch/x86/microcode_amd.c| 30 +++---
 xen/arch/x86/microcode_intel.c  | 35 +--
 xen/arch/x86/smpboot.c  |  2 +-
 xen/include/asm-x86/microcode.h |  7 +++
 xen/include/asm-x86/processor.h |  2 +-
 7 files changed, 38 insertions(+), 60 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index aecc754..4f21903 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -253,7 +253,7 @@ static int enter_state(u32 state)
 
 console_end_sync();
 
-microcode_resume_cpu(0);
+microcode_resume_cpu();
 
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s)\n");
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 552e7fe..3b87c72 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -196,19 +196,19 @@ struct microcode_info {
 char buffer[1];
 };
 
-int microcode_resume_cpu(unsigned int cpu)
+int microcode_resume_cpu(void)
 {
 int err;
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
+struct cpu_signature *sig = _cpu(cpu_sig);
 
 if ( !microcode_ops )
 return 0;
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->apply_microcode(cpu);
+err = microcode_ops->apply_microcode();
 spin_unlock(_mutex);
 
 return err;
@@ -258,9 +258,9 @@ static int microcode_update_cpu(const void *buf, size_t 
size)
 
 spin_lock(_mutex);
 
-err = microcode_ops->collect_cpu_info(cpu, sig);
+err = microcode_ops->collect_cpu_info(sig);
 if ( likely(!err) )
-err = microcode_ops->cpu_request_microcode(cpu, buf, size);
+err = microcode_ops->cpu_request_microcode(buf, size);
 spin_unlock(_mutex);
 
 return err;
@@ -349,8 +349,6 @@ __initcall(microcode_init);
 
 int __init early_microcode_update_cpu(bool start_update)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc = 0;
 void *data = NULL;
 size_t len;
@@ -369,7 +367,7 @@ int __init early_microcode_update_cpu(bool start_update)
 data = bootstrap_map(_mod);
 }
 
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( data )
 {
@@ -387,8 +385,6 @@ int __init early_microcode_update_cpu(bool start_update)
 
 int __init early_microcode_init(void)
 {
-unsigned int cpu = smp_processor_id();
-struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 int rc;
 
 rc = microcode_init_intel();
@@ -401,7 +397,7 @@ int __init early_microcode_init(void)
 
 if ( microcode_ops )
 {
-microcode_ops->collect_cpu_info(cpu, sig);
+microcode_ops->collect_cpu_info(_cpu(cpu_sig));
 
 if ( ucode_mod.mod_end || ucode_blob.size )
 rc = early_microcode_update_cpu(true);
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 9e4ec73..dd3821c 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -78,8 +78,9 @@ struct mpbhdr {
 static DEFINE_SPINLOCK(microcode_update_lock);
 
 /* See comment in start_update() for cases when this routine fails */
-static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
+static int collect_cpu_info(struct cpu_signature *csig)
 {
+unsigned int cpu = smp_processor_id();
 struct cpuinfo_x86 *c = _data[cpu];
 
 memset(csig, 0, sizeof(*csig));
@@ -153,17 +154,15 @@ static bool_t find_equiv_cpu_id(const struct 
equiv_cpu_entry *equiv_cpu_table,
 }
 
 static enum microcode_match_result microcode_fits(
-const struct microcode_amd *mc_amd, unsigned int cpu)
+const struct microcode_amd *mc_amd)
 {
+unsigned int cpu = smp_processor_id();
 const struct cpu_signature *sig = _cpu(cpu_sig, cpu);
 const struct microcode_header_amd *mc_header = mc_amd->mpb;
 const struct equiv_cpu_entry *equiv_cpu_table = mc_amd->equiv_cpu_table;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_i

[Xen-devel] [PATCH v9 12/15] microcode: reduce memory allocation and copy when creating a patch

2019-08-18 Thread Chao Gao

To create a microcode patch from a vendor-specific update,
allocate_microcode_patch() copied everything from the update.
It is not efficient. Essentially, we just need to go through
ucodes in the blob, find the one with the newest revision and
install it into the microcode_patch. In the process, buffers
like mc_amd, equiv_cpu_table (on AMD side), and mc (on Intel
side) can be reused. microcode_patch now is allocated after
it is sure that there is a matching ucode.

Signed-off-by: Chao Gao 
---
Changes in v9:
 - new
---
 xen/arch/x86/microcode_amd.c   | 99 +++---
 xen/arch/x86/microcode_intel.c | 65 ++-
 2 files changed, 58 insertions(+), 106 deletions(-)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 6353323..ec1c2eb 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -194,36 +194,6 @@ static bool match_cpu(const struct microcode_patch *patch)
 return patch && (microcode_fits(patch->mc_amd) == NEW_UCODE);
 }
 
-static struct microcode_patch *alloc_microcode_patch(
-const struct microcode_amd *mc_amd)
-{
-struct microcode_patch *microcode_patch = xmalloc(struct microcode_patch);
-struct microcode_amd *cache = xmalloc(struct microcode_amd);
-void *mpb = xmalloc_bytes(mc_amd->mpb_size);
-struct equiv_cpu_entry *equiv_cpu_table =
-xmalloc_bytes(mc_amd->equiv_cpu_table_size);
-
-if ( !microcode_patch || !cache || !mpb || !equiv_cpu_table )
-{
-xfree(microcode_patch);
-xfree(cache);
-xfree(mpb);
-xfree(equiv_cpu_table);
-return ERR_PTR(-ENOMEM);
-}
-
-memcpy(mpb, mc_amd->mpb, mc_amd->mpb_size);
-cache->mpb = mpb;
-cache->mpb_size = mc_amd->mpb_size;
-memcpy(equiv_cpu_table, mc_amd->equiv_cpu_table,
-   mc_amd->equiv_cpu_table_size);
-cache->equiv_cpu_table = equiv_cpu_table;
-cache->equiv_cpu_table_size = mc_amd->equiv_cpu_table_size;
-microcode_patch->mc_amd = cache;
-
-return microcode_patch;
-}
-
 static void free_patch(void *mc)
 {
 struct microcode_amd *mc_amd = mc;
@@ -320,18 +290,10 @@ static int get_ucode_from_buffer_amd(
 return -EINVAL;
 }
 
-if ( mc_amd->mpb_size < mpbuf->len )
-{
-if ( mc_amd->mpb )
-{
-xfree(mc_amd->mpb);
-mc_amd->mpb_size = 0;
-}
-mc_amd->mpb = xmalloc_bytes(mpbuf->len);
-if ( mc_amd->mpb == NULL )
-return -ENOMEM;
-mc_amd->mpb_size = mpbuf->len;
-}
+mc_amd->mpb = xmalloc_bytes(mpbuf->len);
+if ( mc_amd->mpb == NULL )
+return -ENOMEM;
+mc_amd->mpb_size = mpbuf->len;
 memcpy(mc_amd->mpb, mpbuf->data, mpbuf->len);
 
 pr_debug("microcode: CPU%d size %zu, block size %u offset %zu equivID %#x 
rev %#x\n",
@@ -451,8 +413,9 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
  size_t bufsize)
 {
 struct microcode_amd *mc_amd;
+struct microcode_header_amd *saved = NULL;
 struct microcode_patch *patch = NULL;
-size_t offset = 0;
+size_t offset = 0, saved_size = 0;
 int error = 0;
 unsigned int current_cpu_id;
 unsigned int equiv_cpu_id;
@@ -542,29 +505,21 @@ static struct microcode_patch 
*cpu_request_microcode(const void *buf,
 while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
)) == 0 )
 {
-struct microcode_patch *new_patch = alloc_microcode_patch(mc_amd);
-
-if ( IS_ERR(new_patch) )
-{
-error = PTR_ERR(new_patch);
-break;
-}
-
 /*
- * If the new patch covers current CPU, compare patches and store the
+ * If the new ucode covers current CPU, compare ucodes and store the
  * one with higher revision.
  */
-if ( (microcode_fits(new_patch->mc_amd) != MIS_UCODE) &&
- (!patch || (compare_patch(new_patch, patch) == NEW_UCODE)) )
+#define REV_ID(mpb) (((struct microcode_header_amd *)(mpb))->processor_rev_id)
+if ( (microcode_fits(mc_amd) != MIS_UCODE) &&
+ (!saved || (REV_ID(mc_amd->mpb) > REV_ID(saved))) )
+#undef REV_ID
 {
-struct microcode_patch *tmp = patch;
-
-patch = new_patch;
-new_patch = tmp;
+xfree(saved);
+saved = mc_amd->mpb;
+saved_size = mc_amd->mpb_size;
 }
-
-if ( new_patch )
-microcode_free_patch(new_patch);
+else
+xfree(mc_amd->mpb);
 
 if ( offset >= bufsize )
 break;
@@ -593,9 +548,25 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,

Re: [Xen-devel] Reset pass-thru devices in a VM

2019-08-09 Thread Chao Gao

On Fri, Aug 09, 2019 at 03:23:59PM +0200, Jan Beulich wrote:
>On 09.08.2019 15:24, Chao Gao wrote:
>>On Fri, Aug 09, 2019 at 10:49:32AM +0200, Jan Beulich wrote:
>>>On 09.08.2019 10:38, Chao Gao wrote:
>>>>Alternatively, emulating FLR (Function Level Reset)
>>>>capability for this device might be a feasible way and only needs
>>>>relatively few changes. I am planning to enable an opt-in feature
>>>>(like 'permissive') to allow qemu to expose FLR capability to guest for
>>>>pass-thru devices as long as this device is resetable on dom0 (i.e. the
>>>>device has 'reset' attribute under its sysfs). And when guest initiates
>>>>an FLR, qemu just echo 1 to the 'reset' attribute on dom0.
>>>>
>>>>Do you think emulating FLR capability is doable?
>>>
>>>Wouldn't a such emulated guest initiated reset affect other devices
>>>(likely not under control of this guest) as well?
>>
>>No. Linux kernel guarantees that reset to a device won't affect
>>other devices. Otherwise, such device cannot be reset and no
>>'reset' attribute will be created under device's sysfs.
>>Specfically, the invocation of pci_dev_reset_slot_function() and
>>pci_parent_bus_reset() in pci_probe_reset_function() will check whether
>>the device (function) is the only one under the slot or bus
>>respectively. In pci_create_capabilities_sysfs(), 'reset' attribute is
>>created only if dev->reset_fn is not zero.
>
>Ah, good. But then the opposite question arises: How would your
>proposed change help if the device shares a bus with others?

It wouldn't. If the device supports any way to reset it in dom0, this
change would help. If even in dom0 there is no way to reset a device,
it won't help. But I think for such device, it cannot be safely assigned
to a VM because we rely on PCI reset to clean up sensitive data in the
device programmed by the previous owner.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Reset pass-thru devices in a VM

2019-08-09 Thread Chao Gao

On Fri, Aug 09, 2019 at 02:42:09PM +0200, Roger Pau Monné wrote:
>On Fri, Aug 09, 2019 at 04:38:33PM +0800, Chao Gao wrote:
>> Hi everyone,
>> 
>> I have a device which only supports secondary bus reset. After being
>> assigned to a VM, it would be placed under host bridge. For devices
>> under host bridge, secondary bus reset is not applicable. Thus, a VM
>> has no way to reset this device.
>
>I think in general we don't allow guests to perform any kind of reset
>of PCI devices, that's always in control of the hardware domain.

But reset is trapped and performed by the hardware domain. I don't think
in this way, guest could access registers or gain more permissions over
some registers than it should be.

>
>How are for example BARs going to be placed after such reset?

I don't know BARs are relocated after reset. I will figure it out.
Considering KVM/Qemu do support device reset, I think there
should be some means.

>
>> This device's usage would be limited without PCI reset (for example, its
>> driver cannot re-initialize the device properly without PCI reset, which
>> means in VM device won't be usable after unloading the driver), it would
>> be much better if there is a way available to VMs to reset the device.
>
>Is this something common (ie: requiring device reset functionality)
>for drivers to work correctly?

I don't think it is common. I am not sure whether it is required for GPU
pass-thru in some cases. But I believe that I saw some online materials
about GPU pass-thru mentioned how to enable FLR for a VM.

>
>So far we seem to have managed to get away without it.
>
>> In my mind, a straightfoward solution is to create a virtual bridge
>> for a VM and place the pass-thru device under a virtual bridge. But it
>> isn't supported in Xen (KVM/QEMU supports) and enabling it looks need
>> a lot of efforts. Alternatively, emulating FLR (Function Level Reset)
>> capability for this device might be a feasible way and only needs
>> relatively few changes. I am planning to enable an opt-in feature
>> (like 'permissive') to allow qemu to expose FLR capability to guest for
>> pass-thru devices as long as this device is resetable on dom0 (i.e. the
>> device has 'reset' attribute under its sysfs). And when guest initiates
>> an FLR, qemu just echo 1 to the 'reset' attribute on dom0.
>
>So you would expose the device as FLR capable and just implement it as
>a secondary bus reset on the device model?

For the device I mentioned, yes. Actually, for other devices, it can be
any supported reset method. There are several ways to reset a function;
secondary bus reset is the last choice.

>
>That seems feasible, but as noted above I would be worried about the
>resources owned by the device, and how they are going to be placed
>after such reset. Note you would also have to notify Xen somehow of
>such reset, so it tears down all the state related to the device.

I will figure out what should be done in Xen side.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Reset pass-thru devices in a VM

2019-08-09 Thread Chao Gao

On Fri, Aug 09, 2019 at 10:49:32AM +0200, Jan Beulich wrote:
>On 09.08.2019 10:38, Chao Gao wrote:
>>I have a device which only supports secondary bus reset. After being
>>assigned to a VM, it would be placed under host bridge. For devices
>>under host bridge, secondary bus reset is not applicable. Thus, a VM
>>has no way to reset this device.
>>
>>This device's usage would be limited without PCI reset (for example, its
>>driver cannot re-initialize the device properly without PCI reset, which
>>means in VM device won't be usable after unloading the driver), it would
>>be much better if there is a way available to VMs to reset the device.
>>
>>In my mind, a straightfoward solution is to create a virtual bridge
>>for a VM and place the pass-thru device under a virtual bridge. But it
>>isn't supported in Xen (KVM/QEMU supports) and enabling it looks need
>>a lot of efforts.
>
>Meanwhile I think a couple of years ago there was some initial effort
>to get a newer chipset (Q35 iirc) emulated for HVM guests.

Yes. But it seems that no one is working on this feature now.

>
>>Alternatively, emulating FLR (Function Level Reset)
>>capability for this device might be a feasible way and only needs
>>relatively few changes. I am planning to enable an opt-in feature
>>(like 'permissive') to allow qemu to expose FLR capability to guest for
>>pass-thru devices as long as this device is resetable on dom0 (i.e. the
>>device has 'reset' attribute under its sysfs). And when guest initiates
>>an FLR, qemu just echo 1 to the 'reset' attribute on dom0.
>>
>>Do you think emulating FLR capability is doable?
>
>Wouldn't a such emulated guest initiated reset affect other devices
>(likely not under control of this guest) as well?

No. Linux kernel guarantees that reset to a device won't affect
other devices. Otherwise, such device cannot be reset and no
'reset' attribute will be created under device's sysfs.
Specfically, the invocation of pci_dev_reset_slot_function() and
pci_parent_bus_reset() in pci_probe_reset_function() will check whether
the device (function) is the only one under the slot or bus
respectively. In pci_create_capabilities_sysfs(), 'reset' attribute is
created only if dev->reset_fn is not zero. 

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Reset pass-thru devices in a VM

2019-08-09 Thread Chao Gao

Hi everyone,

I have a device which only supports secondary bus reset. After being
assigned to a VM, it would be placed under host bridge. For devices
under host bridge, secondary bus reset is not applicable. Thus, a VM
has no way to reset this device.

This device's usage would be limited without PCI reset (for example, its
driver cannot re-initialize the device properly without PCI reset, which
means in VM device won't be usable after unloading the driver), it would
be much better if there is a way available to VMs to reset the device.

In my mind, a straightfoward solution is to create a virtual bridge
for a VM and place the pass-thru device under a virtual bridge. But it
isn't supported in Xen (KVM/QEMU supports) and enabling it looks need
a lot of efforts. Alternatively, emulating FLR (Function Level Reset)
capability for this device might be a feasible way and only needs
relatively few changes. I am planning to enable an opt-in feature
(like 'permissive') to allow qemu to expose FLR capability to guest for
pass-thru devices as long as this device is resetable on dom0 (i.e. the
device has 'reset' attribute under its sysfs). And when guest initiates
an FLR, qemu just echo 1 to the 'reset' attribute on dom0.

Do you think emulating FLR capability is doable?

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 16/16] microcode: block #NMI handling when loading an ucode

2019-08-07 Thread Chao Gao

On Mon, Aug 05, 2019 at 12:11:01PM +, Jan Beulich wrote:
>On 01.08.2019 12:22, Chao Gao wrote:
>> @@ -439,12 +440,37 @@ static int do_microcode_update(void *patch)
>>   return ret;
>>   }
>>   
>> +static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>> +{
>> +bool print = false;
>> +
>> +/* The first thread of a core is to load an update. Don't block it. */
>> +if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
>> +return 0;
>> +
>> +if ( loading_state == LOADING_ENTERED )
>> +{
>> +cpumask_set_cpu(cpu, _callin_map);
>> +printk(XENLOG_DEBUG "CPU%u enters %s\n", smp_processor_id(), 
>> __func__);
>
>Here  and ...
>
>> +print = true;
>> +}
>> +
>> +while ( loading_state == LOADING_ENTERED )
>> +rep_nop();
>> +
>> +if ( print )
>> +printk(XENLOG_DEBUG "CPU%u exits %s\n", smp_processor_id(), 
>> __func__);
>
>... here - why smp_processor_id() when you can use "cpu"? And what
>use is __func__ here?
>
>The rep_nop() above also presumably wants to be cpu_relax() again.
>
>But on the whole I was really hoping for more aggressive disabling
>of NMI handling, more like (but of course not quite as heavy as)
>the crash path wiring the IDT entry to trap_nop().

Hi Jan,

I agree with you that it should be more aggressive. This patch is
problematic considering there is a lot of code before reaching this
callback (especially, SPEC_CTRL_ENTRY_FROM_INTR_IST which may write
MSR_SPEC_CTRL).

In my mind, we have two options to solve the issue:
1. Wire the IDT entry to trap_nop() like the crash path.

2. Enhance this patch: A thread which is not going to load an update
is forced to send an #NMI to itself to enter the callback, do
busy-loop until completion of loading ucode on all cores.
With this method, no #NMI delivery, or MSR write would happen on this
threads during microcode update.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

1 2 3 4 >

1 - 100 of 337 matches

Mail list logo